page_type | ms.custom | ms.contributors | languages | products | |||||
---|---|---|---|---|---|---|---|---|---|
sample |
|
|
|
|
This project reads Azure Video Indexer output file (JSON format) into small chunks of searchable content.
So you can make the following queries to the azure search index to find the right moments of a video:
- "Find the moments where 'Jennifer Aniston' in Friends TV show is happy"
{
"search": "Jennifer Aniston",
"searchFields": "faces/face, name",
"filter": "search.ismatchscoring('friends')and (emotions/any(em: em/emotion eq 'Joy')) "
}
- "Find the moments in any show where the transcript is 'getting old'"
{
"search": "getting old",
"searchFields": "transcripts/transcript"
}
- "Find the moments in Friends show where 'Jean Claude' Appears"
{
"search": "Jean Claude",
"searchFields": "faces/face, namedPeople/namedPerson",
"filter": "search.ismatchscoring('friends') ",
}
With the vast growth and progress of multimedia the Internet, particularly videos, text-based retrieval search engines plays an important role on searchability of videos.
With Azure Video Analyzer, developers can quickly build an AI-powered video analytics solution to extract actionable insights from videos, whether stored or streaming. Organizations can enhance workplace safety, in-store experiences, digital asset management, media archives, and more by understanding audio and visual video content without video nor ML expertise.
Azure Video Analyzer produces one flat json file of a provided video that covers a full video duration. How we can come up with a search index schema which is capable of retrieving data that is most relevant to a moment of a video?
Azure In this repository we are processing the output of the Video Analyzer into moments (n seconds) where each moment covers n seconds of a video. We create a Search index having all the features extracted from the Video Analyzer and we upload the extracted moments into the search index. Finally, providing a search query, we can retrieve the related moments of a video from the Azure Search Index. The Search index has flexibility of having more fields to accommodate custom features (e.g., features not existing in Azure Video Analyzer and created for that custom usage).
- Products & Technologies:
- Azure Video analyzer
- Azure Search
- Azure Storage Account (optional)
- Docker (Optional)
- Languages:
- Python
- In this documentation page you may find the architecture of the Azure Video Indexer Searchable moments.
The project can be run like any other Python code, or packaged into a Docker image and run accordingly.
Below we detail both approaches.
For both local development and running the code without Docker, you'll need to do the following.
Create a virtual environment
On Mac and linux:
python3 -m venv env
On Windows:
py -m venv env
Activate your virtual environment
On macOS and Linux:
source env/bin/activate
On Windows:
.\env\Scripts\activate
Add required libraries
pip install -r src/requirements.txt
Copy the contents of config-dev.yml
to a new gitignored file
config.yml
, and add the required details. NOTE: These include secrets and so should never be
pushed to version control.
# this is used when you want to parse files from storage account
storage:
connection-string: "YOUR STORAGE ACCOUNT CONNECTION STRING"
# a container which you store JSON files processed by Azure Video Indexer
container: "YOUR CONTAINER NAME"
search:
service-name: "SEARCH SERVICE NAME"
api-version: "SEARCH SERVICE API VERSION"
api-key: "SEARCH SERVICE API KEY"
index-name: "NAME OF THE SEARCH INDEX YOU WANT TO CREATE"
index-schema-path: "PATH TO SEARCH INDEX SCHEMA e.g. client/index-schema.json"
files:
vi-output-directory: "PATH TO VI JSON FILES e.g. client/files/vi-files"
processed-directory: "PATH TO PROCESSED FILE DIRECTORY e.g. client/files/processed"
ingested-file: "NAME OF THE LOGS FOR PROCESSED ASSETS e.g. ingested.txt"
failed-to-ingest-file: "NAME OF THE LOGS FOR FAIL PROCESSED ASSETS e.g. failed-to-ingest.txt"
# interval durations you wish to create in milliseconds
parser:
milliseconds-interval: 10000
Navigate into the src
directory with cd src
.
Now using command line you can run the scripts in two ways:
- Process insights files from your specified storage account and container:
python main.py
- Process insights files from the local filesystem:
python main.py local
When using Docker, we build and run a Python image, passing in the config details as environment variables, and have the option of mounting our index schema and any other files needed from the host (local machine).
First ensure you have Docker installed and working.
As your config.yml
will not be configurable once an image is built, remove it if created, or just
leave the repo in its original state. With only config-dev.yml
, the code will rely upon relevant
environment variables passed in at runtime to determine how it should run.
From the root of the repository, build the docker image using:
docker -t azure-vi-searchable-content .
See the Dockerfile
for details on how the image is constructed.
To run the image successfully, we need to pass in environment variables as follows:
docker run \
-e INSIGHTS_CONTAINER_NAME="YOUR INSIGHTS CONTAINER NAME" \ ✹
-e SEARCH_SERVICE_NAME="YOUR SEARCH SERVICE NAME" \
-e SEARCH_API_VERSION="2020-06-30" \
-e SEARCH_API_KEY="YOUR SEARCH API KEY" \
-e SEARCH_INDEX_NAME="YOUR DESIRED SEARCH INDEX NAME" \
-e INDEX_SCHEMA_PATH="client/index-schema.json" \
-e VI_OUTPUT_DIRECTORY="client/files/vi-files" \
-e FILE_PROCESSING_LOGS_DIR="client/files/processed" \
-e INGEST_LOG_FILENAME="ingested.txt" \
-e INGEST_FAILURE_LOG_FILENAME="failed-to-ingest.txt" \
-e MILLISECONDS_INTERVAL=10000 \
-e STORAGE_CONNECTION_STRING="YOUR CONNECTION STRING" \
azure-vi-searchable-content:latest
Note that this will still retrieve your search index schema and write log files to inside the
container. This can be undesirable, so we can make all these configurable on the host machine by
volume mounting the mount-files
directory, and managing the files there.
docker run \
-e INSIGHTS_CONTAINER_NAME="YOUR INSIGHTS CONTAINER NAME" \ ✹
-e SEARCH_SERVICE_NAME="YOUR SEARCH SERVICE NAME" \
-e SEARCH_API_VERSION="2020-06-30" \
-e SEARCH_API_KEY="YOUR SEARCH API KEY" \
-e SEARCH_INDEX_NAME="YOUR DESIRED SEARCH INDEX NAME" \
-e INDEX_SCHEMA_PATH="mount-files/index-schema.json" \
-e VI_OUTPUT_DIRECTORY="mount-files/logs" \
-e FILE_PROCESSING_LOGS_DIR="mount-files/logs" \
-e INGEST_LOG_FILENAME="ingested.txt" \
-e INGEST_FAILURE_LOG_FILENAME="failed-to-ingest.txt" \
-e MILLISECONDS_INTERVAL=10000 \
-e STORAGE_CONNECTION_STRING="YOUR CONNECTION STRING" \
-v "$(pwd)"/mount-files:/src/mount-files \
azure-vi-searchable-content:latest
When using the command above, changes to your index schema will be used in a newly running container, without need to rebuild the image.
This can be simplified further using a gitignored .env file (.env-dev
is given as
an example), running the image as follows:
docker run \
--env-file "./.env" \
-v "$(pwd)"/mount-files:/src/mount-files \
azure-vi-searchable-content:latest
1. Read data from the blob:
To read the JSON files from the blob:
- Make sure the config.yml has required values to connect to the storage account.
- Run the storage-client.py script
This will read the data from a container into your local machine
2. Create/Upload data into Azure Search: Now that you have the data in your local machine, you can create a search index and after parsing those data, upload them into the search index
Run search-client.py which performs the following:
- Creates a search index using the file
index-schema.json
(here as standard and here for Docker volume mounting) - Reads JSON files from
files/blob-files
- Using Parser class parses each json file and creates list of created intervals
- Upload the data into azure search index
Note 1: By default (filename is configurable) every successful upload will be logged to <Your-specified-logs-directory>/ingested.txt
Note 2: By default (filename is configurable) every unsuccessful upload will be logged to <Your-specified-logs-directory>/failed-to-ingest.txt