Sample presidio image evaluation notebook generates error #1251

mpsampat · 2024-01-07T20:44:07Z

Describe the bug
The notebook https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_dicom_redactor_evaluation.ipynb
generates an error and does not provide the evaluation results.
the error is shown belown.

To Reproduce
Steps to reproduce the behavior:

Git clone the repo:
https://github.com/microsoft/presidio.git
Go to the folder: presidio/docs/samples/python
run the jupyter notebook called "example_dicom_redactor_evaluation.ipynb"
the first few cells work fine.
the cell with this code give the error. the code is:
_, eval_results = dicom_engine.eval_dicom_instance(instance, gt_file_of_interest)
the error i get is shown below:
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 _, eval_results = dicom_engine.eval_dicom_instance(instance, gt_file_of_interest)

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/dicom_image_pii_verify_engine.py:175, in DicomImagePiiVerifyEngine.eval_dicom_instance(self, instance, ground_truth, padding_width, tolerance, display_image, use_metadata, ocr_kwargs, ad_hoc_recognizers, **text_analyzer_kwargs)
165 # Verify detected PHI
166 verify_image, ocr_results, analyzer_results = self.verify_dicom_instance(
167 instance,
168 padding_width,
(...)
173 **text_analyzer_kwargs,
174 )
--> 175 formatted_ocr_results = self.bbox_processor.get_bboxes_from_ocr_results(
176 ocr_results
177 )
178 detected_phi = self.bbox_processor.get_bboxes_from_analyzer_results(
179 analyzer_results
180 )
182 # Remove duplicate entities in results

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results)
12 """Get bounding boxes on padded image for all detected words from ocr_results.
13
14 :param ocr_results: Raw results from OCR.
15 :return: Bounding box information per word.
16 """
17 bboxes = []
---> 18 print(ocr_results["text"])
19 for i in range(len(ocr_results["text"])):
20 detected_text = ocr_results["text"][i]

TypeError: list indices must be integers or slices, not str`
Expected behavior

expect to get precision recall as shown in the notebook committed in the repo

Additional context
could you please help provide a workaround for this issue. should i use an older tag of presidio ?

The text was updated successfully, but these errors were encountered:

mpsampat · 2024-01-08T02:03:45Z

This issue also exists for other pages such as creating ground truth files page:
https://microsoft.github.io/presidio/image-redactor/evaluating_dicom_redaction/#creating-ground-truth-files;
the following lines of code generate the error shown below
# Format results for more direct comparison ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results) analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)

error observed:

TypeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 ocr_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_ocr_results(ocr_results)
2 analyzer_results_formatted = dicom_engine.bbox_processor.get_bboxes_from_analyzer_results(analyzer_results)

File /opt/conda/lib/python3.10/site-packages/presidio_image_redactor/bbox.py:18, in BboxProcessor.get_bboxes_from_ocr_results(ocr_results)
12 """Get bounding boxes on padded image for all detected words from ocr_results.
13
14 :param ocr_results: Raw results from OCR.
15 :return: Bounding box information per word.
16 """
17 bboxes = []
---> 18 print(ocr_results["text"])
19 for i in range(len(ocr_results["text"])):
20 detected_text = ocr_results["text"][i]

omri374 · 2024-01-28T10:36:01Z

Thank you @mpsampat, and apologies for the delayed response. We'll look into this.

gianni-di-noia · 2024-10-20T00:31:01Z

The method verify_dicom_instance already returns formatted ocr_results . The variable is called ocr_bboxes in the codebase.

omri374 added bug Something isn't working image-anonymization dicom labels Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample presidio image evaluation notebook generates error #1251

Sample presidio image evaluation notebook generates error #1251

mpsampat commented Jan 7, 2024

mpsampat commented Jan 8, 2024

omri374 commented Jan 28, 2024

gianni-di-noia commented Oct 20, 2024

Sample presidio image evaluation notebook generates error #1251

Sample presidio image evaluation notebook generates error #1251

Comments

mpsampat commented Jan 7, 2024

mpsampat commented Jan 8, 2024

omri374 commented Jan 28, 2024

gianni-di-noia commented Oct 20, 2024