Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetinaNet script: Validation data set error #14

Open
MurielleMardenli200 opened this issue Oct 29, 2024 · 12 comments
Open

RetinaNet script: Validation data set error #14

MurielleMardenli200 opened this issue Oct 29, 2024 · 12 comments
Assignees

Comments

@MurielleMardenli200
Copy link
Contributor

In the RetinaNet script (see current PR), an error is thrown in the evaluate method used on the validation set. It uses the data set of json_annotation_val.json generated by preprocess_data_coco in preprocessing.py.

This is the error gotten when running the train script
Validation run stopped due to: A prediction has class=53, but the dataset only has 2 classes and predicted class id should be in [0, 1]

@hermancollin
Copy link
Member

This stackoverflow thread might provide some insight. Could you tell me what is the value of the following attributes: thing_list and thing_dataset_id_to_contiguous_id? (I think they should be in the JSON file) The latter is being used by the evaluation method to find the number of classes (num_classes). This variable is in turn responsible for the error.

Ref: https://detectron2.readthedocs.io/en/latest/_modules/detectron2/evaluation/coco_evaluation.html

    def _eval_predictions(self, predictions, img_ids=None):
        """
        Evaluate predictions. Fill self._results with the metrics of the tasks.
        """
        self._logger.info("Preparing results for COCO format ...")
        coco_results = list(itertools.chain(*[x["instances"] for x in predictions]))
        tasks = self._tasks or self._tasks_from_predictions(coco_results)

        # unmap the category ids for COCO
        if hasattr(self._metadata, "thing_dataset_id_to_contiguous_id"):
            dataset_id_to_contiguous_id = self._metadata.thing_dataset_id_to_contiguous_id
            all_contiguous_ids = list(dataset_id_to_contiguous_id.values())
            num_classes = len(all_contiguous_ids)
            assert min(all_contiguous_ids) == 0 and max(all_contiguous_ids) == num_classes - 1

            reverse_id_mapping = {v: k for k, v in dataset_id_to_contiguous_id.items()}
            for result in coco_results:
                category_id = result["category_id"]
                assert category_id < num_classes, (
                    f"A prediction has class={category_id}, "
                    f"but the dataset only has {num_classes} classes and "
                    f"predicted class id should be in [0, {num_classes - 1}]."
                )

@hermancollin
Copy link
Member

If the thing_list attribute is absent, the solution might just be to set it like so:

MetadataCatalog.get("COCO_VAL_ANNOTATION").set(thing_classes=["axon"])

@MurielleMardenli200
Copy link
Contributor Author

It looks like thing_classes already has ["myelin", "axon"] and thing_dataset_id_to_contiguous_id already has the id's: {1: 0, 2: 1} and I'm unable to set them to a different value due to the error of type: Attribute 'thing_dataset_id_to_contiguous_id' in the metadata of '../data-coco/annotations/json_annotation_val.json' cannot be set to a different value! {0: 0, 1: 1} != {1: 0, 2: 1}

According to this num_classes should have a value of 2, but it doesn't

@hermancollin
Copy link
Member

unfortunately the only other mention of this issue was here facebookresearch/detectron2#5103
and they didn't get an answer from devs

@MurielleMardenli200
Copy link
Contributor Author

Just updating this here to document it, this week I debugged this problem and found that class=53 actually represents the first element of the array in the pred_class attribute of my model output. The array is supposed to contain values of 0, for the only class (axon). This is abnormal since the pred_boxes attribute contains plausible values because in the validation visualization, the model is able to draw boxes around a good part of axons.

{'instances': 
  pred_boxes: Boxes(...)
  pred_classes: tensor([53, 59, 57, 55, 14, 53, 25, 58, 77, 47, 14, 77, 55, 56, 73, 62, 59, 59,
          77, 54, 47, 15, 72, 26, 45,  0, 47, 47, 14, 57, 14, 77, 47, 54, 14, 54])])

I've been looking into solving this this week in order to properly use the COCOEvaluator class for the validation and test set.

@hermancollin
Copy link
Member

@MurielleMardenli200 if I understand correctly in your dict pred_boxes and pred_classes both contain the same number of elements, and pred_classes contains a random integer for every box, can you confirm this?

Why does the model predict these classes?

@hermancollin
Copy link
Member

hermancollin commented Nov 15, 2024

Also note that COCOEvaluator() takes an argument called max_dets_per_image which limits the number of detection per image, and is set by default to 100. Maybe this is why your results looked like they were missing axons?

@hermancollin
Copy link
Member

Btw which preprocessing script are you using? Is it this one? https://github.com/axondeepseg/axon-detection/blob/main/src/preprocessing.py

because the code for COCO still considers both axon and myelin. If you fixed this on your side, could you make a PR to update the preprocessing script?

@MurielleMardenli200
Copy link
Contributor Author

MurielleMardenli200 commented Nov 15, 2024

@MurielleMardenli200 if I understand correctly in your dict pred_boxes and pred_classes both contain the same number of elements, and pred_classes contains a random integer for every box, can you confirm this?

Why does the model predict these classes?

  1. Yes exactly, pred_classes and pred_boxes have the same sizes. I'm not sure why it predicts them this way. The randomness of the integers is consistent even if I change hyperparameters. I'm gonna debug more to find the error.

  2. I was not using COCOEvaluator to visualize the predictions since it wasn't working because of this issue, but I'll try using it after I resolve it.

  3. I made the preprocessing script fix in a separate branch (fix/preprocessing). Here is the Pull request. But I am using the same code in my branch (dev/retinaNet).

@hermancollin
Copy link
Member

@MurielleMardenli200 in the meantime maybe you can "manually" set all classes to 1 and try COCOEvaluator with this?

@hermancollin
Copy link
Member

hermancollin commented Nov 23, 2024

Currently, you assign all classes to a single class with category_id = 0. It looks like this is reserved for background (see openvinotoolkit/datumaro#974). This means during preprocessing, the annotations should all have category_id = 1 instead. Maybe this is why the model predicts gibberish there.

@MurielleMardenli200
Copy link
Contributor Author

MurielleMardenli200 commented Nov 25, 2024

Currently, you assign all classes to a single class with category_id = 0. It looks like this is reserved for background (see openvinotoolkit/datumaro#974). This means during preprocessing, the annotations should all have category_id = 1 instead. Maybe this is why the model predicts gibberish there.

@hermancollin I tried changing the category_id of the axon_myelin class to 1 in the preprocessing file here and here. I had to add a category_id of 0 to the background even if there are no annotations associated to it so that the amount of classes in the Metadata of Detectron (lenthing_classes) is registered as 2.

But I am still getting the same giberrish in the output predictions & the same error thrown when 2 classes are registered:
Validation run stopped due to:A prediction has class=60, but the dataset only has 2 classes and predicted class id should be in [0, 1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants