Delete annotation and image when the sum of the annotations reaches a certain size #1203

DP1701 · 2023-11-27T07:55:36Z

Hello everyone,

I have a coco_instance data set that contains several polygons. I would like to filter the following: If one or more polygons of a certain class takes up more than 1/4 of the image (image resolution), then I would like to delete the image and all annotations in it. Now the question arises for me whether this can be achieved with Datumaro or whether I should rather design my own Python script for this?

vinnamkim · 2023-11-28T04:53:54Z

Hi @DP1701,
Thanks for your interest on our project!

Unfortunately, there is no exact functionality what you want. I'll pile up this functionality to our development backlog. Therefore, you might implement your own Python script for this. I just wrote about this how it can be possible using Datumaro on my side.

Prepare toy synthetic dataset (can be skipped)

import numpy as np
import datumaro as dm

### Create example dataset ###

def create_example_dataset() -> dm.Dataset:
    blank_img = dm.Image.from_numpy(np.zeros([10, 10, 3], dtype=np.uint8))
    categories = ["label_1", "label_2", "label_3"]

    points_of_1x1_box = np.array([0, 0, 0, 1, 1, 1, 1, 0])

    item_not_to_drop = dm.DatasetItem(
        id="item_not_to_drop",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=label + points_of_1x1_box,
                label=label,
            )
            for label in range(len(categories))
        ],
    )

    item_drop_by_big_polygon = dm.DatasetItem(
        id="item_drop_by_big_polygon",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=8 * points_of_1x1_box,  # 8x8 box
                label=0,
            )
        ],
    )

    item_drop_by_polygon_union = dm.DatasetItem(
        id="item_drop_by_polygon_union",
        media=blank_img,
        annotations=[
            dm.Polygon(
                points=offset + 4 * points_of_1x1_box,  # 10 4x4 boxes placed in diagnoal
                label=1,
            )
            for offset in range(10)
        ],
    )

    return dm.Dataset.from_iterable(
        iterable=[
            item_not_to_drop,
            item_drop_by_big_polygon,
            item_drop_by_polygon_union,
        ],
        categories=categories,
    )

dataset = create_example_dataset()

### Print result

print(dataset)
for item in dataset:
    print(item)

Dataset
	size=3
	source_path=None
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=3
	annotations_count=14
subsets
	default: # of items=3, # of annotated items=3, # of annotations=14, annotation types=['polygon']
infos
	categories
	label: ['label_1', 'label_2', 'label_3']

DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_big_polygon', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 8.0, 8.0, 8.0, 8.0, 0.0], label=0, z_order=0)], attributes={})
DatasetItem(id='item_drop_by_polygon_union', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 0.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 2.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 3.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 4.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 5.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[6.0, 6.0, 6.0, 10.0, 10.0, 10.0, 10.0, 6.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[7.0, 7.0, 7.0, 11.0, 11.0, 11.0, 11.0, 7.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[8.0, 8.0, 8.0, 12.0, 12.0, 12.0, 12.0, 8.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[9.0, 9.0, 9.0, 13.0, 13.0, 13.0, 13.0, 9.0], label=1, z_order=0)], attributes={})

Filtering Python script

### Remove if the maximum union area of polygons > 1 / 4 image size ###

from collections import defaultdict
import shapely.geometry as sg

def get_max_polygon_area(polygon_group_by_label: dict[int, list[sg.Polygon]]) -> float:
    max_area = 0.0
    for polygons in polygon_group_by_label.values():
        union = sg.Polygon()
        for polygon in polygons:
            union = union.union(polygon)
        max_area = max(max_area, union.area)
    return max_area

### Gather item id and subset to remove

items_to_remove = []

for item in dataset:
    height, width = item.media_as(dm.Image).size
    image_size = height * width

    polygon_group_by_label = defaultdict(list)
    for ann in item.annotations:
        if not isinstance(ann, dm.Polygon):
            continue

        polygon_group_by_label[ann.label] += [sg.Polygon(ann.get_points())]

    max_polygon_area = get_max_polygon_area(polygon_group_by_label)

    if max_polygon_area > 1 / 4 * image_size:
        print(
            f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}, "
            f"Remove this item: {item.id}"
        )
        items_to_remove += [(item.id, item.subset)]
    else:
        print(f"item_id: {item.id}, max_polygon_area: {max_polygon_area}, image_size: {image_size}")

### Remove from the dataset

for item_id, subset in items_to_remove:
    dataset.remove(id=item_id, subset=subset)

### Print result

print(dataset)
for item in dataset:
    print(item)

item_id: item_not_to_drop, max_polygon_area: 1.0, image_size: 100
item_id: item_drop_by_big_polygon, max_polygon_area: 64.0, image_size: 100, Remove this item: item_drop_by_big_polygon
item_id: item_drop_by_polygon_union, max_polygon_area: 79.0, image_size: 100, Remove this item: item_drop_by_polygon_union
Dataset
	size=1
	source_path=None
	media_type=<class 'datumaro.components.media.Image'>
	annotated_items_count=1
	annotations_count=3
subsets
	default: # of items=1, # of annotated items=1, # of annotations=3, annotation types=['polygon']
infos
	categories
	label: ['label_1', 'label_2', 'label_3']

DatasetItem(id='item_not_to_drop', subset='default', media=ImageFromNumpy(data=array([[[0, 0, 0], ...), annotations=[Polygon(id=0, attributes={}, group=0, object_id=-1, points=[0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0], label=0, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0], label=1, z_order=0), Polygon(id=0, attributes={}, group=0, object_id=-1, points=[2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 2.0], label=2, z_order=0)], attributes={})

I hope this would be helpful for your work.

### Summary - Ticket no. 127146 - Same as title - Updated the Jupyter notebook example as well. - It is raised by this user requirement, #1203 ### How to test Added some unit tests as well. ### Checklist  - [x] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [x] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Signed-off-by: Kim, Vinnam <[email protected]>

wonjuleee · 2024-02-01T00:38:50Z

Thanks @vinnamkim for providing the workaround. @DP1701, hope to be well with above. I will close this issue.

github-actions bot assigned vinnamkim Nov 27, 2023

vinnamkim added the FEATURE New feature & functionality label Nov 28, 2023

vinnamkim mentioned this issue Dec 18, 2023

Add Filtering via User-Provided Python Functions #1230

Merged

6 tasks

wonjuleee closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete annotation and image when the sum of the annotations reaches a certain size #1203

Delete annotation and image when the sum of the annotations reaches a certain size #1203

DP1701 commented Nov 27, 2023

vinnamkim commented Nov 28, 2023 •

edited

Loading

wonjuleee commented Feb 1, 2024

Delete annotation and image when the sum of the annotations reaches a certain size #1203

Delete annotation and image when the sum of the annotations reaches a certain size #1203

Comments

DP1701 commented Nov 27, 2023

vinnamkim commented Nov 28, 2023 • edited Loading

wonjuleee commented Feb 1, 2024

vinnamkim commented Nov 28, 2023 •

edited

Loading