Not understanding why DICOM redaction does not detect Patient Name on example data #1309

parataaito · 2024-02-23T10:20:23Z

Hello !

First, thanks for this tool, it looks very promising, so congrats on the idea!

I have a question though.
I followed walkthrough from here:
I used the "0_ORIGINAL.dcm" file from the test files.

Here is my code to show it seems identical to the tutorial:

import pydicom
from presidio_image_redactor import DicomImageRedactorEngine
import matplotlib.pyplot as plt

def compare_dicom_images(
    instance_original: pydicom.dataset.FileDataset,
    instance_redacted: pydicom.dataset.FileDataset,
    figsize: tuple = (11, 11)
) -> None:
    """Display the DICOM pixel arrays of both original and redacted as images.

    Args:
        instance_original (pydicom.dataset.FileDataset): A single DICOM instance (with text PHI).
        instance_redacted (pydicom.dataset.FileDataset): A single DICOM instance (redacted PHI).
        figsize (tuple): Figure size in inches (width, height).
    """
    _, ax = plt.subplots(1, 2, figsize=figsize)
    ax[0].imshow(instance_original.pixel_array, cmap="gray")
    ax[0].set_title('Original')
    ax[1].imshow(instance_redacted.pixel_array, cmap="gray")
    ax[1].set_title('Redacted')
    plt.show()
    
# Set input and output paths
input_path = "0_ORIGINAL.dcm"
output_dir = "./output"

# Initialize the engine
engine = DicomImageRedactorEngine()

# Option 1: Redact from a loaded DICOM image
dicom_image = pydicom.dcmread(input_path)
redacted_dicom_image = engine.redact(dicom_image, use_metadata=True, fill="contrast")

compare_dicom_images(dicom_image, redacted_dicom_image)

However, my output is this:

I don't understand why the Patient Name is not redacted like it is on your example :

For additional info, I am using Python 3.11.2 (but I tried with 3.9 too).

PS: I did not put it in bug since I am not 100% sure it is. It's probably on my side but I have no idea where it comes from...

Thanks in advance :)

The text was updated successfully, but these errors were encountered:

parataaito · 2024-02-23T11:33:10Z

Just want to add that I also followed the example_dicom_image_redactor.ipynb
Here are my results:

parataaito · 2024-03-27T10:34:52Z

Hello !
It's been a month now and no news :'(
Anybody had the same problem and managed to solve it?

omri374 · 2024-03-28T13:22:53Z

Apologies for the delay. We will look into this soon and report back.

omri374 · 2024-03-29T14:12:24Z

@parataiito a hotfix was created a a new version released. Could you please check again? Apologies for the late resolution on this!

omri374 · 2024-03-29T14:12:34Z

Closing for now, please re-open if needed.

parataaito · 2024-03-29T14:19:35Z

Thanks for the (very) quick reply!
Going to check right away!

parataaito · 2024-03-29T14:56:12Z

Works like a charm on all the demo files! So that's perfect!

I also tested them on random data I generated and I was wondering if you understand why it does not work specifically on this on : sample_data.zip

Is it due to the fact the data I burnt in the pixel array is not matched to any value in the DICOM tags?

omri374 · 2024-05-01T10:50:07Z

The DICOM redactor either takes values from the tags, or uses different text based approaches to identify entities such as names. In this case the default spaCy model used by Presidio does is not able to detect "ez OY" as a name, but a different model can. I would suggest experimenting with changing Presidio's configuration. For example:

import pydicom

from presidio_analyzer import AnalyzerEngine, RecognizerResult
from presidio_analyzer.nlp_engine import TransformersNlpEngine
from presidio_image_redactor import ImageAnalyzerEngine, DicomImagePiiVerifyEngine, DicomImageRedactorEngine
model_config = [
    {
        "lang_code": "en",
        "model_name": {
            "spacy": "en_core_web_sm",
            "transformers": "StanfordAIMI/stanford-deidentifier-base",
        },
    }
]

nlp_engine = TransformersNlpEngine(models=model_config)
text_analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)
image_analyzer_engine = ImageAnalyzerEngine(analyzer)
dicom_engine = DicomImagePiiVerifyEngine(image_analyzer_engine=image_analyzer_engine)

instance = pydicom.dcmread(file_of_interest)
verify_image, ocr_results, analyzer_results = dicom_engine.verify_dicom_instance(instance, padding_width=25, show_text_annotation=True)

Running this version with the spaCy model does not identify the bounding box with a name as PII, whereas this transformers model (StanfordAIMI/stanford-deidentifier-base) does. I would suggest to further look into ways to improve and customize the PII detection flows with Presidio: https://microsoft.github.io/presidio/tutorial/

jhssilva · 2024-05-09T12:57:51Z

Hi @omri374 .
I've the problem that the DICOM Redaction doesn't detect the text on the header. Please refer to the following image. (I'll redact the data from the patience and set as blur as this is an official image.)

This is the code that I'm currently using:

input_path = "./test"
output_dir = "./output"

engine = DicomImageRedactorEngine()

pattern_all_text = Pattern(name="any_text", regex=r"(?s).*", score=0.5)
custom_recognizer = PatternRecognizer(
    supported_entity="TEXT",
    patterns=[pattern_all_text]
)

dicom_image = pydicom.dcmread(input_path)
redacted_dicom_image = engine.redact(dicom_image, fill="background", use_metadata=False , ad_hoc_recognizers = [custom_recognizer], allow_list=[])
redacted_dicom_image.save_as(f"{output_dir}/redacted_dicom.dcm")

redact_image = pydicom.dcmread(output_dir + "/redacted_dicom.dcm")
redact_image = redact_image.pixel_array
plt.imshow(redact_image, cmap='gray')
plt.show()

It redacts all the information less the header.

omri374 · 2024-05-09T17:07:07Z

It could be an OCR issue, where the OCR just can't detect the bounding box. Have you looked into the bounding boxes returned by the OCR?

omri374 · 2024-05-09T17:07:42Z

adding @niwilso and @ayabel in case they have any recommendations here as DICOM experts.

jhssilva · 2024-05-11T09:08:58Z

Thank you for the answer @omri374.
Should I look into something particular in the bboxes?

This is the output of the simple program.

I've followed the following documentation. The header doesn't seem to be detected by the bboxes.

Regarding the image this is an DICOM image ultrasound. Even if I save it as a normal image and then use presidio the issue persists.

ayabel · 2024-05-12T06:47:43Z

hi @jhssilva, it might be because the contrast between the text and the background is relatively low. In this case, you might want to consider preprocessing the image before feeding it to the redactor. Ideas for such preprocessing functions could be found here:

presidio-image-redactor/presidio_image_redactor/image_processing_engine.py
Specifically, applying the cv2.adaptiveThreshold function could help increase the contrast

jhssilva · 2024-05-15T20:53:30Z

Hey @ayabel . Thank you for your input and guidance.

I've tested with the adaptiveThreshold as suggested.
However in my case it creates a problem as I need the images to stay with the original contrast. (for now, possibly it will change in the future)

Being said that I've decided to take a different approach.
Selecting the top part of the image redacting and then bundle the images together. This approach seems to work.
Example,

pattern_all_text = Pattern(name="any_text", regex=r"(?s).*", score=0.5)
custom_recognizer = PatternRecognizer(
    supported_entity="TEXT",
    patterns=[pattern_all_text]
)
dicom_image = Image.open("new_image.png")

top_height = 60

# Convert the original image to a numpy array
image = np.array(dicom_image)

top_part = image[0:top_height, :]

rest_of_image = image[top_height:, :]

# Convert the top part of the image back to a PIL Image
top_part_image = Image.fromarray(top_part)

redacted_image = redactor_image.redact(top_part_image, fill="black", ad_hoc_recognizers=[custom_recognizer], allow_list=[])

final_image = np.concatenate((redacted_image, rest_of_image), axis=0)

plt.imshow(final_image)
plt.show()

Note: In this example I didn't redact the bottom part of the image.

Suggestion: Would be nice to have an example to such cases in the documentation as using the adaptive treshold or use the approach that I've suggested to specific cases.

Image Output

parataaito changed the title ~~No understanding how DICOM redaction works~~ No understanding why DICOM redaction does not detect Patient Name Feb 23, 2024

parataaito changed the title ~~No understanding why DICOM redaction does not detect Patient Name~~ No understanding why DICOM redaction does not detect Patient Name on example data Feb 23, 2024

parataaito changed the title ~~No understanding why DICOM redaction does not detect Patient Name on example data~~ Not understanding why DICOM redaction does not detect Patient Name on example data Feb 23, 2024

omri374 mentioned this issue Mar 28, 2024

Fixed wrong condition for dicom metadata #1347

Merged

5 tasks

omri374 closed this as completed Mar 29, 2024

omri374 reopened this May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not understanding why DICOM redaction does not detect Patient Name on example data #1309

Not understanding why DICOM redaction does not detect Patient Name on example data #1309

parataaito commented Feb 23, 2024 •

edited

Loading

parataaito commented Feb 23, 2024

parataaito commented Mar 27, 2024

omri374 commented Mar 28, 2024

omri374 commented Mar 29, 2024

omri374 commented Mar 29, 2024

parataaito commented Mar 29, 2024

parataaito commented Mar 29, 2024 •

edited

Loading

omri374 commented May 1, 2024

jhssilva commented May 9, 2024 •

edited

Loading

omri374 commented May 9, 2024

omri374 commented May 9, 2024

jhssilva commented May 11, 2024 •

edited

Loading

ayabel commented May 12, 2024

jhssilva commented May 15, 2024 •

edited

Loading

Not understanding why DICOM redaction does not detect Patient Name on example data #1309

Not understanding why DICOM redaction does not detect Patient Name on example data #1309

Comments

parataaito commented Feb 23, 2024 • edited Loading

parataaito commented Feb 23, 2024

parataaito commented Mar 27, 2024

omri374 commented Mar 28, 2024

omri374 commented Mar 29, 2024

omri374 commented Mar 29, 2024

parataaito commented Mar 29, 2024

parataaito commented Mar 29, 2024 • edited Loading

omri374 commented May 1, 2024

jhssilva commented May 9, 2024 • edited Loading

omri374 commented May 9, 2024

omri374 commented May 9, 2024

jhssilva commented May 11, 2024 • edited Loading

ayabel commented May 12, 2024

jhssilva commented May 15, 2024 • edited Loading

parataaito commented Feb 23, 2024 •

edited

Loading

parataaito commented Mar 29, 2024 •

edited

Loading

jhssilva commented May 9, 2024 •

edited

Loading

jhssilva commented May 11, 2024 •

edited

Loading

jhssilva commented May 15, 2024 •

edited

Loading