-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter out items without annotations #1208
Comments
Hi @CourchesneA, First of all, the To filter out data with annotations, you can use the following command: datum filter -e '/item[annotation]' The I hope my response will be helpful. |
That is what I was looking for. Thanks !! |
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary 1. Correct documents to use correctly use 'datum project import' command. 2. Add filtering example to filter out items containing annotations only. (openvinotoolkit#1208) <!-- Resolves openvinotoolkit#111 and openvinotoolkit#222. Depends on openvinotoolkit#1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem openvinotoolkit#1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [x] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ```
I am trying to remove items for which I do not have annotations in a dataset. From the documentation, I have seen that the following could help me:
datum filter -m i+a ./my-dataset
it looks like it is required to have a filter, so I can add a generic filter that matches all:
datum filter -m i+a -e '/item/image[has_data=1]' -o ./my-output-dataset --dry-run ./my-dataset
However, it always results in an empty output (dry-run returns nothing). By using
datum stats
I can see that this test dataset had 13 images and 10 for which there are annotations.By removing the
--mode
flag, I get my whole dataset printed as XML and I can see that most of the items have annotations:I would expect the
-m i+a
to be able to output a valid dataset of only the items for which annotations are availableThe text was updated successfully, but these errors were encountered: