Release v0.2: Dataset versioning
This release adds dataset versioning capabilities and significantly changes the command line.
It also improves CLI and API documentation, and extends the transformations library.
A Datumaro project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are now modified inplace, by default. The project layout is updated. To update
an old project to the new version, use datum project migrate
.
Added
- A new installation target:
pip install datumaro[default]
, which should be
used in most cases by default. The simpledatumaro
is supposed for library users (#238) - Dataset and project versioning capabilities (Git-like) (#238)
- [CLI] "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format indiff
,merge
,explain
andinfo
CLI commands (#238) - [CLI]
import
,remove
,commit
,checkout
,log
,status
,info
CLI commands (#238) - [CLI]
patch
CLI command to patch one dataset from another (#401) - [CLI, API]
ProjectLabels
transform to change dataset labels for merging etc. (#401, #478) - [API] Type annotations and docs for
Annotation
classes (#493) - [formats] Support for custom labels in the KITTI detection format (#481)
- [formats]
Coco*Extractor
classes now have an option to preserve label IDs from the
original annotation file (#453) - [formats] Options to control label loading behavior in
imagenet_txt
import (#434, #489) - Data collection by telemetry. Check this notice about the details (#495)
Changed
- A project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are modified inplace, by default (#328) - [CLI] The
import
command copies datasets by default. Useadd
to add datasets without copying (#508) - [CLI] Projects use new file layout, incompatible with old projects.
An old project can be updated withdatum project migrate
(#238) - [CLI]
diff
andediff
are joined into a singlediff
CLI command (#238) - [CLI] CLI help for builtin plugins doesn't require project (#328)
- [API] The
Project
class fromdatumaro.components
is changed completely (#238) - [API] Inheriting
CliPlugin
is not required in plugin classes (#238) - [API]
Importer
s do not createProject
s anymore and just return a list of
extractor configurations (#238) - [API] Annotation-related classes were moved into a new module,
datumaro.components.annotation
(#439) - [API] Rollback utilities replaced with Scope utilities (#444)
Removed
- [CLI]
project merge
CLI command (#238) - Support for project hierarchies. A project cannot be a source anymore (#238)
- A project cannot have independent internal dataset anymore. All the project
data must be stored in the project data sources (#238) datumaro_project
format (#238)- [API] Unused
path
field ofDatasetItem
(#455)
Fixed
- Deprecation warning in
open_images_format.py
(#440) lazy_image
returning unrelated data sometimes (#409)- Invalid call to
pycocotools.mask.iou
(#450) - Importing of Open Images datasets without image data (#463)
- Return value type in
Dataset.is_modified
(#401) - Incorrect remapping of secondary categories in
RemapLabels
(#401) - VOC dataset patching for classification and segmentation tasks (#478)
- Exported mask label ids in KITTI segmentation (#481)
- Missing
label
forPoints
read in the LFW format (#494)