Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prioritize using PIL to get image size (#1259)
<!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary Accelerate loading of image file-based datasets. I found that printing out the YOLO dataset information for the first time was slow. After some digging I found that `datamaro` was reading the entire dataset through to get the size of each image. ```python ds = Dataset.import_from("/yolo-ultralytics", "yolo") print(ds) # <-- wait a long time ``` ```python # from class Image @Property def size(self) -> Optional[Tuple[int, int]]: """Returns (H, W)""" if self._size is None: try: data = self.data # <-- load the whole media into memory except _image_loading_errors: return None if data is not None: self._size = tuple(map(int, data.shape[:2])) return self._size ``` Interactive encoding with datasets on HDD is slow. So I added an override `size()` property in the `ImageFromFile` class which first tries to get the image size using `PIL`. The `PIL` library is about 8 times faster than `OpenCV` in getting the image size. All dataset classes that use the `size` property of `ImageFromFile` can benefit from this modification. <!-- Resolves #111 and #222. Depends on #1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem #1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [x] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [x] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ``` --------- Co-authored-by: Vinnam Kim <[email protected]>
- Loading branch information