Validation of a multi-purpose quality control system operated by AI

Summary

Company Name	Nõo Lihatööstus
Development Team Lead Name	Dr. Ardi Tampuu
Development Team Lead E-mail	[email protected]
Duration of the Demonstration Project	March 2024 - November 2024
Final Report	Final_report.pdf

Description

Objectives of the Demonstration Project

The demo project was aimed at addressing multiple quality management issues required on the production line simultaneously. Since the company currently handles a wide variety of products, product and packaging changes occur frequently. With thousands of such changes, errors are inevitable. This project sought to enable early detection and prevention of these errors.

In early meetings of the project, the following quality assurance tasks that the client was interested in and could potentially be solved based on the camera feed were listed:

date missing
date false
date in the wrong location
date low quality/incomprehensible for humans
one or both labels missing
product missing
wrong product in the package
product folded or misaligned

Activities and Results of the Demonstration Project

Challenge

The challenge was to perform a variety of quality assurance tasks with the same and relatively cheap sensor - the camera. The cameras are often installed anyway to monitor the work, and the camera feed saving and management system already existed in Nõo Lihatööstus. While camera inputs are readily available, camera-based solutions have until recently been sensitive to environmental conditions and difficult to make realiable.

Working with real data coming from a real environment is also always a challenge. Our camera feed contains scenes of workers manually re-positioning the labels, of different objects being placed on the production line, hands and heads of workers appearing in the scene, strong reflections and light changes. The system must be smart enough to ignore any changes in the image not brought about by the movement of the production line (the next cycle) and deal with the various image quality issues.

Among the possible quality issues to track discussed in March 2024, only the following two were left out:

Determining the quality of the expiry date printing/readability. In our setup, any drop in date reading performance is more likely due to image quality (reflections, blur, low contrast), not printing quality. The two cannot be decoupled based on model outputs. We nevertheless assure the date is correct (ddmmyyyy) and that the dots between days, months, and years are printed.
Evaluating if the product is folded or placed wrongly inside the package. This evaluation is likely possible to a useful degree of performance, using the same technology as we use for product identification. However, sufficient datasets of the “wrong appearance” and “acceptable appearance” of different products are needed to train such models. There was insufficient time to collect and label such datasets.

Data Sources

The following data sources, data collection steps and processing were used:

Cameras installed above the production line #1 in Nõo Lihatööstus. A variety of cameras and camera setups were tested during the 9-month period. Finally, a setup of two 8MP cameras was converged on, each seeing half of the production line width. The cameras stream their video feed via RTSP protocol while also saving videos to Milestone program. Videos downloaded from Milestone were either low quality or very large in size due to each frame being 10 Mb, but were used during the intermediate development phase.
Final dataset was collected from the RTSP stream. For the modules that needed labeled data, the data was labeled automatically based on the barcode reader result. The automatically labeled data was manually verified and sub-selected - e.g. only packages with products. Templates for one-shot modules were also obtained from the same streams. Heuristically working modules were also configured and tested against the live RTSP stream.
Information about the correspondence between barcode IDs (EAN13 codes) and the products, expiry durations, and label layout of each product was acquired from the enterprise and complemented manually with relevant information by the delevopment team.

AI Technologies

In different modules, mainly three AI technologies were used:

Commercial Dynamsoft barcode reader SDK. Camera-based barcode reading probably uses AI technologies to localize barcodes. We were forced to use a commercial solution for barcode reading due to the small pixel-per-barcode resolution we could acquire. For all products to fit into the camera frames and for the camera to be high enough to not disturb workers on the production line, the image was large compared to barcodes. Working at the borderline of what barcode reading can and cannot read (3 pixels per narrowest possible line), open-source solutions failed to offer stable performance.
ParSeq scene text recognition model. This neural-network-based text recognition model is not specifically designed to read expiry dates but performs sufficiently well for our purposes. A variety of optical character recognition models, scene text recognition models, and even models explicitly designed to read expiry dates were tested. ParSeq offered the best performance when tested at around mid-way through our project. There was no reason to switch later. However, switching to using any other tool is relatively simple in the code we created.
DinoV2-small image foundation model. This model was used to acquire semantic embeddings of key image areas. a) First, embeddings of the areas where the product labels should appear were compared to the embedding of the expected appearance as per a template, via Euclidean distance. This solution is many times more stable across label misplacements (gluing 1cm to the left or right), reflections, and light changes than any template-matching, any dominant-color-based, or any saturation-values-based solution, many of which were tried thoroughly. It requires one template image just like template matching and does not involve model training (one-shot approach, easy to add new products or change expected label appearance). b) Secondly, create image embeddings of product window areas of the packages. Based on a collected dataset of such product appearance embeddings, an XGBoost model was trained to separate the products from each other and from empty packages.
Additionally, the automatic detection of production line motion is data-driven and falls into the category of Computer Vision. It relies on measuring the horizontal optical flow between frames in the RTSP video feed.

Technological Results

Final performance:

Production cycle detection: >99% reliable. As for rare edge cases, we have encountered one false positive case where the worker’s hand movement above the line triggered our system to believe there was line movement. The system also reliably detects movement of the production line when there are no labels (just semi-transparent plastic), not resulting in false negatives.
Barcode detection: Dynamsoft solution is 100% precision, but limited recall. Out of the 12 barcodes on the image (8 packages we focus on + another partial row), it detects, on average, two. Frequently no barcode is detected. This is one of the two best barcode software we found, the problem is image resolution, not the software. However, as all barcodes are the same, and we usually find a barcode in at least a few cycles time, we likely do not lose many cycles. The detection rate depends on the color scheme. Dark red bars are much worse than black. A yellow background is worse than white.
Label detection: >99% accurate. Missing labels and correctly glued labels are never misidentified. If a label is glued in the wrong direction or is heavily folded, the system mostly still detects this as a label. See few examples of edge cases in this folder: link.
Date detection: 25-65 % recall. Date dectetion performace is hinderered by reflections, bad label locations (date is not in the expected image area), low resolution, and low contrast between the background color and font color on some labels. For example, date areas in one cycle here: link. This performance is sufficient to know if the ddmmyyyy set on the machine is correct - the mode of predictions must match the expected value.
Product identification: Empty packages are very reliably identified. No false positives not negative were observed during multiple hours of work. The solution is sufficient to count empty packages.
Product identification: Products are reliably identified in the correct product class. E.g. salamis get mixed with other products only within the salami class, not with other products. If a defining characteristic exists, such as seasoning at the edge surface of a ham, the model identifies the product reliably. The solution is sufficient to detect visually different products (alert if label and product category do not match).

Technical Architecture

From left to right on the figure:

The two camera RTSP feeds are monitored, and production line movement (cycles) is detected. Only one frame per cycle is added to the frame queue for each camera.
Barcode detection is performed on the two frames originating from one cycle.
Based on the barcode found, the product ID is known. From the database, the system now knows where to cut out individual products from each frame.
Given product cutouts and product ID, we can cut out product windows, label areas, and date areas and process them with corresponding modules.
The results are collected and visualized in the command line and in the dashboard.

User Interface

Our solution yields results for every production cycle. These results can be stored in the database, but are also sent to a dashboard and visualized in an organized manner in the command terminal.

The development of a dashboard was not a stated goal of this project. However, we created a simplistic dashboard showing the detection results (barcode, label existence, date readouts, and product identification) from the last 20 cycles (160 products). This allows the user to monitor the recent past of the production line’s work.

Future Potential of the Technical Solution

The solution can be used and adapted to any production line where images of sufficient quality of individual products can be acquired. Given a cutout of a single product in standardized size and rotation, the key areas (labels, product window, date, or other) can be defined in image coordinates, and these areas can be analyzed by our modules.

The modules that are somewhat specific to the production line in Nõo Lihatööstus (harder to generalize to other cases) are:

Production cycle detection to capture one frame per cycle, based on optical flow. This will not work for constantly moving lines. In that case, the frame capture times must be specified in some other way.
Individual product cutout. In Nõo Lihatööstus’ line, the products always appear at the same locations in the camera feed, albeit their layout and size can differ from product to product. It suffices for the user to annotate only once where the product bounding boxes are on the video frames and where the product window, and product expiry date are on the product cutout. In all subsequent encounters with this product, identified by barcode ID, the stored bounding box locations can be used. In the case of random product locations in the captured frames, one would need to use another method of localizing the products. OpenCV template matching performed extremely badly in our case. YOLO-type models might be suitable to fine-tune and use in this case. This, however, assumes the existence of a labeled dataset.
Finding the expiry date. In our case, the date always appears at a specific location with respect to the bounding box of the product. We believe this to be the case in 95% of products.

In short, localizing products and key areas of the product will inevitably be a product and production-line-specific task. Once a future user can localize these key areas on their products, our modules can be applied.

Lessons Learned

About basic data science. Data is the most common issue, and higher-quality data is the most common cure. We struggled with performance as long as camera feeds were of low quality. We finally converged on the camera setup to use 2 months before the deadline, which is obviously too late.
About video streams. OpenCV processes video files and RTSP streams differently. Videos are processed frame by frame. RTSP is processed by receiving the „oldest frame in the buffer,” with the buffer rapidly filling with our 10MB frames. This means we skip a significant amount of frames if we do computations between frame captures (relevant for optical flow). Also, it means we cannot count time in frames, but must use wall time.
OpenCV template matching, no matter with which tricks, is very sensitive either to lateral shifts (correlation-based measures), light conditions (absolute difference-based measures) or results in high compute cost (if trying to compensate for the possibility of shifts). Additionally, template matching could not localize the product labels on an image containing products placed next to each other - it often found the highest match halfway between the products. In all, it was not reliable for any purpose we wished to use it for.
In contrast, DinoV2 vision foundation model proved to be robust to translations, reflections, and light condition changes. It returns the semantic meaning of the image, and visual aspects (shift to the left or right, light conditions) are only a small part of the semantic meaning vector, whereas they influence the original RGB tensor globally. Consequently, in this embedding space, the distances between the template images of labels and labels from the video feed were reliably low while distance between the template and the bare package (label missing) was reliably high. Beyond label detection, DinoV2 embeddings of different product appearances allow for the separation of products via a relatively lightweight machine-learning model.

Using DinoV2 image-to-semantics model allowed us to overcome the usual model sensitivity to inputs. We no longer need to collect a dataset of all possible light conditions and appearance diversity to train a functioning computer vision model. DinoV2 has seen and learned the basics of how to see, and how to know two images carry the same meaning.

#How to

Installation and setup

git clone the repository
(install conda,) Make a conda environment (conda create --name envname), activate it (conda activate envname)
Navigate to "dates/praseq/" folder. The installation instructions in that folder's README will fail, because of outdated huggingface hub version and another weird package name in case of GPU. So, instead, follow these steps:

# Use define if to use GPU
platform=gpu # or cpu
# skip the "make" command originally in the install instructions, we have provided core.cpu.txt and core.gpu.txt. 
# Install the project and core + test dependencies. No need do waste space for installing training code
pip install -r requirements/core.${platform}.txt -e .[test]

if this fails, fall back to the parseq installation instructions, but know that huggingface-hub version should likely be updated and there is no need for "+gpu" tag after torch packages if installing with gpu

navigate back to root folder of this repository
if in Linux or Mac remove the line "pywin32==306" from requirements.txt
install all other requirements with "pip install -r requirements.txt"
Fill in config.json file with the RTSP addresses and Dynamsoft barcode reader key (make an account and request one via https://www.dynamsoft.com/ )

Run the solution

activate the conda environment
Launch dashboard:

navigate to folder frontend/
run "python3 app.py"
look at terminal output to find the address and port to see the dashboard in browser, it should be http://127.0.0.1:5000/dashboard

Launch the main body of code:

open another terminal or new tab in terminal
activate conda environment
navigate back to root folder
launch python3 Pipeline/two_processes.py

You will see printouts in terminal and summarized results on the dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
JsonCalls		JsonCalls
Optical_flow		Optical_flow
Pipeline		Pipeline
Slicer		Slicer
barcode		barcode
dates		dates
frontend		frontend
productValidationAI		productValidationAI
LICENSE		LICENSE
README.md		README.md
Report.pdf		Report.pdf
barcode_data.json		barcode_data.json
config.json		config.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Validation of a multi-purpose quality control system operated by AI

Summary

Description

Objectives of the Demonstration Project

Activities and Results of the Demonstration Project

Challenge

Data Sources

AI Technologies

Technological Results

Technical Architecture

User Interface

Future Potential of the Technical Solution

Lessons Learned

Installation and setup

Run the solution

About

Releases

Packages

Contributors 2

Languages

License

ai-robotics-estonia/2024_AI_Powered_Quality_Control_Noo_Lihatoostus

Folders and files

Latest commit

History

Repository files navigation

Validation of a multi-purpose quality control system operated by AI

Summary

Description

Objectives of the Demonstration Project

Activities and Results of the Demonstration Project

Challenge

Data Sources

AI Technologies

Technological Results

Technical Architecture

User Interface

Future Potential of the Technical Solution

Lessons Learned

Installation and setup

Run the solution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages