Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete annotation and image when the sum of the annotations reaches a certain size #1203

Closed
DP1701 opened this issue Nov 27, 2023 · 2 comments
Assignees
Labels
FEATURE New feature & functionality

Comments

@DP1701
Copy link

DP1701 commented Nov 27, 2023

Hello everyone,

I have a coco_instance data set that contains several polygons. I would like to filter the following: If one or more polygons of a certain class takes up more than 1/4 of the image (image resolution), then I would like to delete the image and all annotations in it. Now the question arises for me whether this can be achieved with Datumaro or whether I should rather design my own Python script for this?

@vinnamkim
Copy link
Contributor

vinnamkim commented Nov 28, 2023

Hi @DP1701,
Thanks for your interest on our project!

Unfortunately, there is no exact functionality what you want. I'll pile up this functionality to our development backlog. Therefore, you might implement your own Python script for this. I just wrote about this how it can be possible using Datumaro on my side.

  1. Prepare toy synthetic dataset (can be skipped)
import numpy as np
import datumaro as dm

### Create example dataset ###

def create_example_dataset() -> dm.Dataset:
    blank_img = dm.Image.from_numpy(np.zeros([10, 10, 3], dtype=np.uint8))
    categories = ["label_1", "label_2", "label_3"]

    points_of_1x1_box = np.array([0, 0, 0, 1, 1, 1, 1, 0])

    item_not_to_drop = dm.DatasetItem(
        id="item_not_to_drop",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=label + points_of_1x1_box,
                label=label,
            )
            for label in range(len(categories))
        ],
    )

    item_drop_by_big_polygon = dm.DatasetItem(
        id="item_drop_by_big_polygon",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=8 * points_of_1x1_box,  # 8x8 box
                label=0,
            )
        ],
    )

    item_drop_by_polygon_union = dm.DatasetItem(
        id="item_drop_by_polygon_union",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=offset + 4 * points_of_1x1_box,  # 10 4x4 boxes placed in diagnoal
                label=1,
            )
            for offset in range(10)
        ],
    )

    return dm.Dataset.from_iterable(
        iterable=[
            item_not_to_drop,
            item_drop_by_big_polygon,
            item_drop_by_polygon_union,
        ],
        categories=categories,
    )

dataset = create_example_dataset()

### Print result

print(dataset)
for item in dataset:
    print(item)
Dataset
	size=3
	source_path=None
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=3
	annotations_count=14
subsets
	default: # of items=3, # of annotated items=3, # of annotations=14, annotation types=['polygon']
infos
	categories
	label: ['label_1', 'label_2', 'label_3']

DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_big_polygon', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 0.0], label=0, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_polygon_union', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 2.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 3.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 4.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 5.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[6.0, 6.0, 6.0, 10.0, 10.0, 10.0, 10.0, 6.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[7.0, 7.0, 7.0, 11.0, 11.0, 11.0, 11.0, 7.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[8.0, 8.0, 8.0, 12.0, 12.0, 12.0, 12.0, 8.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[9.0, 9.0, 9.0, 13.0, 13.0, 13.0, 13.0, 9.0], label=1, z_order=0)], attributes={})
  1. Filtering Python script
### Remove if the maximum union area of polygons > 1 / 4 image size ###

from collections import defaultdict
import shapely.geometry as sg

def get_max_polygon_area(polygon_group_by_label: dict[int, list[sg.Polygon]]) -> float:
    max_area = 0.0
    for polygons in polygon_group_by_label.values():
        union = sg.Polygon()
        for polygon in polygons:
            union = union.union(polygon)
        max_area = max(max_area, union.area)
    return max_area

### Gather item id and subset to remove

items_to_remove = []

for item in dataset:
    height, width = item.media_as(dm.Image).size
    image_size = height * width

    polygon_group_by_label = defaultdict(list)
    for ann in item.annotations:
        if not isinstance(ann, dm.Polygon):
            continue

        polygon_group_by_label[ann.label] += [sg.Polygon(ann.get_points())]

    max_polygon_area = get_max_polygon_area(polygon_group_by_label)

    if max_polygon_area > 1 / 4 * image_size:
        print(
            f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}, "
            f"Remove this item: {item.id}"
        )
        items_to_remove += [(item.id, item.subset)]
    else:
        print(f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}")

### Remove from the dataset

for item_id, subset in items_to_remove:
    dataset.remove(id=item_id, subset=subset)

### Print result

print(dataset)
for item in dataset:
    print(item)
item_id: item_not_to_drop, max_polygon_area: 1.0, image_size: 100
item_id: item_drop_by_big_polygon, max_polygon_area: 64.0, image_size: 100, Remove this item: item_drop_by_big_polygon
item_id: item_drop_by_polygon_union, max_polygon_area: 79.0, image_size: 100, Remove this item: item_drop_by_polygon_union
Dataset
	size=1
	source_path=None
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=1
	annotations_count=3
subsets
	default: # of items=1, # of annotated items=1, # of annotations=3, annotation types=['polygon']
infos
	categories
	label: ['label_1', 'label_2', 'label_3']

DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={})

I hope this would be helpful for your work.

@vinnamkim vinnamkim added the FEATURE New feature & functionality label Nov 28, 2023
vinnamkim added a commit that referenced this issue Dec 19, 2023
<!-- Contributing guide:
https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md
-->

### Summary

- Ticket no. 127146
- Same as title
- Updated the Jupyter notebook example as well.
- It is raised by this user requirement,
#1203

### How to test
Added some unit tests as well.

### Checklist
<!-- Put an 'x' in all the boxes that apply -->
- [x] I have added unit tests to cover my changes.​
- [ ] I have added integration tests to cover my changes.​
- [x] I have added the description of my changes into
[CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md).​
- [x] I have updated the
[documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs)
accordingly

### License

- [x] I submit _my code changes_ under the same [MIT
License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE)
that covers the project.
  Feel free to contact the maintainers if that's a concern.
- [x] I have updated the license header for each file (see an example
below).

```python
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT
```

---------

Signed-off-by: Kim, Vinnam <[email protected]>
@wonjuleee
Copy link
Contributor

Thanks @vinnamkim for providing the workaround. @DP1701, hope to be well with above. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FEATURE New feature & functionality
Projects
None yet
Development

No branches or pull requests

3 participants