Image Finder is an implementation of the Image Finder project as part of the CompTech2022 winter school.
You can try fast search by text or image over 2 millions Professional photo in Demo app.
The product goal is possibility to help users to quickly finding the right images.
According to a survey conducted as part of CompTech2022, users typically extract images on their devices themselves. This opportunity will be useful for people whose professional activities or hobbies are related to photo. In addition, we learned that this product will be useful to absolutely any user.
This product is aimed at all users.
- Image search;
- Search by text query (Russian/English);
- Video search.
colabs
— directory with research experiments in colab notebooks;test
– directory with tests;assets
— directory with images;main.py
— main file that includes all classes and functions for user-friendly web-servicefaissindexer.py
— contains FAISS indexer that stores image embeddings and searches nearest neighbors for given text embedding;dummyindexer.py
— contains simple indexer that stores image embbeddings and searches nearest neighbors for given text embedding by one-vs-all comparison;hnsw_indexer.py
— contains HNSW indexer that stores image embbeddings and searches nearest neighbors for given text embedding by approximate nearest neighbor search;embedder.py
— contains wrapper-classes for different CLIP models;searchmodel.py
— classes that connect indexers and CLIP embedders, load and store indexed images and their paths;CLIP_attention_maps.py
— attention maps for CLIP model;ruCLIP_attention_maps.py
— attention maps for RuCLIP model;requirements.txt
— list of dependencies
For example
- Text query
- Image
This project was tested on python 3.7
- Clone repository
git clone https://github.com/comptech-winter-school/image-finder.git
- Install required dependencies from requirements.txt.
- Run command
streamlit run main.py --server.port {PORT}
- Copy IP-ADDRESS:PORT from terminal and paste it in browser
- Select preferred indexer
- Select text query or image method for processing
- Select output image count
- If you want to filter output results, you can use threshold slider
- The images will be print with sorting of cosine distance.
For running in Docker run these commands:
docker build -t streamlitapp:latest
docker run -p 8501:8501 streamlitapp:latest
App will be deployed at http://localhost:8501/
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.
RuCLIP (Russian Contrastive Language–Image Pretraining) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. RuCLIP builds on a large body of work on zero-shot transfer, computer vision, natural language processing and multimodal learning.
To solve the problem of image search by Russian-language queries, two models were considered: RuCLIP and RuCLIP-SB.
For the analysis, the CIFAR-100 dataset was selected, which contains 100 classes of 600 images with a size of 32x32. There are 500 training images and 100 test images in each class. This set is well suited for comparing models, as it has a wide variety of classes.
To evaluate the models, we will solve the classification problem. The definition of an image class can be considered as a search for an image by a query that is equal to the class label.
Precision, recall, accuracy, top-5 accuracy were analyzed. The RuCLIP model is better than RuCLIP-SB in all parameters, therefore, RuCLIP was chosen to solve the problem of image search by Russian-language queries.
The inference of RuCLIP and the optimized RuCLIP model via ONNX was investigated. Optimization gives a visible increase in speed.
Processing results for queries in the singular and plural are almost the same
Processing results depending on the number of iterations
- pandas — software library in Python for data processing and analysis.
- numpy — software library in Python that adds support for large multidimensional arrays and matrices.
- faiss — library of algorithms for finding nearest neighbors in linear space.
- nmslib — cross-platform similarity search library.
- streamlit — open-source app framework, the fastest way to build and share data apps.
- torch — open source deep learning framework.
- Cifar100; URL: https://www.cs.toronto.edu/%7Ekriz/cifar.html
- Unsplash; URL: https://unsplash.com/data
- Developers:
- Anna Glushkova,
- Vasiliy Dronov,
- Kirill Keller,
- Alexandr Minin,
- Maxim Mashtakov,
- Vladislav Kuznetsov,
- Dmitry Moskalev
- Team Lead:
- Dmitry Moskalev
- Mentors:
- Amir Uteuov,
- Vladimir Kilyazov.