Skip to content

Australian-Imaging-Service/phi-finder

Repository files navigation

PHI-finder

CI/CD Codecov

Local testing (docker required)

conda create -n phi-finder python==3.11
conda activate phi-finder
pip install -e .[dev,test] --no-cache-dir
pytest .

Building

python -m pip install --upgrade build

python -m build

pip install dist/phi_finder-0.1.14-py3-none-any.whl

Basic usage (headers only)

import pydicom as dicom
from phi_finder.dicom_tools import anonymise_dicom

path = "/path/to/some/dicom.dcm"
dcm = dicom.dcmread(path)
anonymised_dcm = anonymise_dicom.anonymise_image(dcm)
anonymised_dcm.save_as('/path/to/some/dicom_anon.dcm')

More advanced usage

import pydicom as dicom
from presidio_image_redactor import (
    DicomImageRedactorEngine, ImageAnalyzerEngine, ContrastSegmentedImageEnhancer)
from phi_finder.dicom_tools import anonymise_dicom

path = "/path/to/some/dicom.dcm"
dcm = dicom.dcmread(path)
score_threshold=.15
analyser = anonymise_dicom._build_presidio_analyser(score_threshold, "en_core_web_lg")
image_redactor = DicomImageRedactorEngine(
    image_analyzer_engine=ImageAnalyzerEngine(
        analyzer_engine=analyser, 
        image_preprocessor=ContrastSegmentedImageEnhancer(),
        ))
anonymised_dcm = anonymise_dicom.anonymise_image(dcm,score_threshold=score_threshold,
                                                 analyser=analyser,
                                                 image_redactor=image_redactor,
                                                 )
anonymised_dcm.save_as('/path/to/some/dicom_anon.dcm')

De-identifying headers with the DICOM PS3.15 profile

By default anonymise_image scans the header values with the Presidio NER pipeline (and GLiNER, when supplied). Passing use_case="PS3.15" instead de-identifies the headers with the DICOM PS3.15 Annex E Basic Application Level Confidentiality Profile. This applies the standard's per-attribute actions (empty, dummy, remove, or remap UIDs), records the de-identification in DeidentificationMethod / DeidentificationMethodCodeSequence, and sets PatientIdentityRemoved to YES. In this mode the NER engines are not run on the headers, so you do not need to build an analyser.

import pydicom as dicom
from phi_finder.dicom_tools import anonymise_dicom

path = "/path/to/some/dicom.dcm"
dcm = dicom.dcmread(path)
anonymised_dcm = anonymise_dicom.anonymise_image(dcm, use_case="PS3.15")
anonymised_dcm.save_as('/path/to/some/dicom_anon.dcm')

Retain Patient Characteristics

Use use_case="PS3.15_Rtn. Pat." to apply the basic profile together with the PS3.15 Retain Patient Characteristics Option. Direct identifiers (patient name, birth date, etc.) are still removed, but patient characteristics such as age, sex, size, weight, ethnic group and smoking status are kept. The retain option is recorded in DeidentificationMethodCodeSequence (code 113108).

import pydicom as dicom
from phi_finder.dicom_tools import anonymise_dicom

path = "/path/to/some/dicom.dcm"
dcm = dicom.dcmread(path)
anonymised_dcm = anonymise_dicom.anonymise_image(dcm, use_case="PS3.15_Rtn. Pat.")
anonymised_dcm.save_as('/path/to/some/dicom_anon.dcm')

The use_case match is case-insensitive and tolerant of separator spelling, so "PS3.15", "ps3.15", "PS3_15" and "PS3-15" all select the plain profile, and "PS3.15_Rtn. Pat." or "PS3.15 Retain Patient Characteristics" select the retain variant. Any other value (e.g. "Standard", the default, or "Aggressive") falls back to the Presidio/GLiNER pipeline described above.

Note: use_case only controls how the headers are handled. Burned-in pixel PHI is still redacted only when an image_redactor is passed, exactly as in the examples above.

About

Collection of tools to check uploaded scans and records for identifiable data

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages