Data Cleansing

Data migration presents a one-time opportunity to “clean up” legacy image data as it is moved to the new archive. This so-called “Data Cleansing” is offered as an extra-cost option in data migration projects. Exception conditions that are candidates for remediation include:


Misidentified Data

    • Patient demographic data (Name, Patient ID, Birth date, Gender)
    • Exam/Order identification (Accession Number, Requested Procedure ID)

Junk Data

    • Test images
    • Discarded images

Bad Data

    • Unsupported old SOP classes (image types)
    • Corrupted/invalid DICOM objects

Hard Errors

    • Media read errors
    • Missing primary or backup volumes

Some of these exception conditions were errors at the time the data sets were created, and others may have been introduced later. Patient Identification, for example, may be updated as a person’s name changes or when identification errors are discovered, sometimes years after the fact. In some systems, the name-change process leaves the same study in the archive under both the old and new patient names. In addition, data migration may consolidate sets of image data stored with different patient ID systems, and it may be desirable to map patient IDs to a new master ID system when they are migrated.


Methods for matching are commonly rule-based or use heuristic or probability-based methods, or a combination thereof. The latter methods are usually proprietary algorithms that often offer better name matching than rule-based methods. However, this incremental improvement comes at the expense of not knowing what the “black box” algorithms are doing. Rule-based methods sequentially apply a set of agreed-upon rules to the image and examination inventories. The rules are refined iteratively in consultation with the customer. The advantage of rule-based methods is that they are deterministic, resulting in a documented and agreed-upon set of operations that will be performed on the data as it is migrated.


But there’s far more to data migration than data cleansing. Here’s what Laitek sees as the key elements of any migration project.