Data Cleansing

Data Cleansing

Category Methods
Date Published Dec 04, 2018
Written by LAITEK
Share This Article

Data cleansing

Data migration presents a one-time opportunity to “clean up” legacy image data as it is moved to the new archive. This so-called “Data Cleansing” is offered as an extra-cost option in data migration projects. Exception conditions that are candidates for remediation include:

Misidentified Data

Patient demographic data (Name, Patient ID, Birth date, Gender)
Exam/Order identification (Accession Number, Requested Procedure ID)

Junk Data

Test images
Discarded images

Bad Data

Unsupported old SOP classes (image types)
Corrupted/invalid DICOM objects

Hard Errors

Media read errors
Missing primary or backup volumes

Some of these exception conditions were errors at the time the data sets were created, and others may have been introduced later. Patient Identification, for example, may be updated as a person’s name changes or when identification errors are discovered, sometimes years after the fact. In some systems, the name-change process leaves the same study in the archive under both the old and new patient names. In addition, data migration may consolidate sets of image data stored with different patient ID systems, and it may be desirable to map patient IDs to a new master ID system when they are migrated.

Methods for matching are commonly rule-based or use heuristic or probability-based methods, or a combination thereof.

The latter methods are usually proprietary algorithms that often offer better name matching than rule-based methods. However, this incremental improvement comes at the expense of not knowing what the “black box” algorithms are doing. Rule-based methods sequentially apply a set of agreed-upon rules to the image and examination inventories. The rules are refined iteratively in consultation with the customer. The advantage of rule-based methods is that they are deterministic, resulting in a documented and agreed-upon set of operations that will be performed on the data as it is migrated.

Dirty Data

The most common data cleansing operations are the correction of images misidentified at the Patient or Study level.

Patient Level cleansing fills in or corrects missing or erroneous patient attributes based on patterns in the other patient attributes, usually matching to an authoritative patient list from the Hospital or Radiology Information System (HIS/RIS).

Study Level cleansing matches the imaging studies to a list of examinations from the HIS/RIS, populating Accession Number and Requested Procedure attributes with values from the corresponding matched HIS/RIS examination. Patient attributes are also used in exam-level matching, which offers better results than matching exams or patients alone.

See our glossary terms

Migratek Advanced Migration Services

Migratek Hybrid Migration Services

Migratek Essentials Migration Services

ATRIUM Suite

ATRIUM Route

ATRIUM Convert

ATRIUM Keep

All

Articles

DICOM Standards

Case Studies

Our Story

Standards & Conformance

The Laitek Difference

Careers

Data Cleansing

Data cleansing

Misidentified Data

Junk Data

Bad Data

Hard Errors

Dirty Data

TALK TO AN EXPERT TODAY

United States

United Kingdom

Romania