diff --git a/docs/model-configuration.md b/docs/model-configuration.md index 89d16d2c..2be1e3ba 100644 --- a/docs/model-configuration.md +++ b/docs/model-configuration.md @@ -78,6 +78,17 @@ The list features only model wrappers which intoduce new configuration values in 1. `blur_strength`: int - blurring kernel size. -1 value means no blurring and no soft_threshold 1. `soft_threshold`: float - probability threshold value for bounding box filtering. inf value means no blurring and no soft_threshold 1. `return_soft_prediction`: bool - return raw resized model prediction in addition to processed one +### `ActionClassificationModel` +1. `labels`: List - list of class labels +1. `path_to_labels`: str - path to file with labels. Overrides the labels, if they sets via 'labels' parameter +1. `mean_values`: List - normalization values, which will be subtracted from image channels for image-input layer during preprocessing +1. `pad_value`: int - pad value for resize_image_letterbox embedded into a model +1. `resize_type`: str - crop, standard, fit_to_window or fit_to_window_letterbox +1. `reverse_input_channels`: bool - reverse the input channel order +1. `scale_values`: List - normalization values, which will divide the image channels for image-input layer + +> **NOTE** `ActionClassificationModel` isn't subclass of ImageModel. + ### `Bert` and its subclasses 1. `vocab`: Dict - mapping from string token to int 1. `input_names`: str - comma-separated names of input layers diff --git a/model_api/python/README.md b/model_api/python/README.md index f45aef89..b7bdcbcf 100644 --- a/model_api/python/README.md +++ b/model_api/python/README.md @@ -70,6 +70,7 @@ The following tasks can be solved with wrappers usage: | Question Answering | | | Salient Object Detection | | | Semantic Segmentation | | +| Action Classification | | ## Model API Adapters diff --git a/model_api/python/docs/ActionClassification.md b/model_api/python/docs/ActionClassification.md new file mode 100644 index 00000000..a3b0eb71 --- /dev/null +++ b/model_api/python/docs/ActionClassification.md @@ -0,0 +1,79 @@ +# ActionClassification Wrapper + +## Use Case and High-Level Description + +The `ActionClassificationModel` is a wrapper class designed for action classification models. +This class provides support for data preprocessing and postprocessing like other model wrapper classes. +Note that it isn't a subclass of `ImageModel`. It gets video as input so it is different than ImageModel. + +## How to use + +Utilizing the ActionClassificationModel is similar to other model wrappers, with the primary difference being the preparation of video clip inputs instead of single images. + +Below is an example demonstrating how to initialize the model with OpenVINO™ IR files and classify actions in a video clip. + + +```python +import cv2 +import numpy as np +# import model wrapper class +from model_api.models import ActionClassificationModel +# import inference adapter and helper for runtime setup +from model_api.adapters import OpenvinoAdapter, create_core + + +# load video and make a clip as input +cap = cv2.VideoCapture("sample.mp4") +input_data = np.stack([cap.read()[1] for _ in range(8)]) + + +# define the path to action classification model in IR format +model_path = "action_classification.xml" + +# create adapter for OpenVINO™ runtime, pass the model path +inference_adapter = OpenvinoAdapter(create_core(), model_path, device="CPU") + +# instantiate the ActionClassificationModel wrapper +# setting preload=True loads the model onto the CPU within the adapter0 +action_cls_model = ActionClassificationModel(inference_adapter, preload=True) + +# perform preprocessing, inference, and postprocessing +results = action_cls_model(input_data) +``` + +As illustrated, initializing the model and performing inference can be achieved with minimal code. +The wrapper class takes care of input processing, layout adjustments, and output processing automatically. + + +## Arguments + +- `labels`(`list[str]`) : List of class labels +- `path_to_labels` (`str`) : Path to file with labels. Labels are overrided if it's set. +- `mean_values` (`list[int | float]`) Normalization values to be subtracted from the image channels during preprocessing. +- `pad_value` (`int`) Pad value used during the resize resize_image_letterbox operation embedded within the model. +- `resize_type` (`str`) : The method of resizing the input image. Valid options include `crop`, `standard`, `fit_to_window`, and `fit_to_window_letterbox`. +- `reverse_input_channels` (`bool`) : Whether to reverse the order of input channels. +- `scale_values` (`list[int | float]`): Normalization values used to divide the image channels during preprocessing. + +## Input format + +The input format for action classification tasks differs from other vision tasks due to the nature of video data. +The input tensor includes additional dimensions to accommodate the video format. +It's often refered as single alphabet, and each alphabet means as below. + +- N : Batch size +- S : Numer of clips x Number of crops +- C : Number of channels +- T : Time +- H : Height +- W : Width + +The input should be provided as a single clip in THWC format. +Depending on the specified layout, the input will be transformed into either NSTHWC or NSCTHW format. +Unlike other vision model wrappers that utilize OpenVINO™'s PrePostProcessors (PPP) for preprocessing, +the ActionClassificationModel performs its preprocessing due to the current lack of video format support in OpenVINO™ PPP. + +## output format + +The output is encapsulated in a ClassificationResult object, which includes the indices, labels, and logits of the top predictions. +At present, saliency maps, feature vectors, and raw scores are not provided.