Skip to content

Commit

Permalink
fix: update paths for EAN8 and short barcodes (padding with zeroes) -…
Browse files Browse the repository at this point in the history
… DO NOT MERGE (#10472)

I closed the very old PR
#3915 and I am
opening this new one to put it on the first page.

Original issue:
#3818

We will need to move the file structures on all flavors (OFF, OBF, OPF,
OPFF) roughly at the same time, as otherwise we will have issues if some
products are moved from one flavor to another.

So the migration script is using "old" conventions so that it can run on
the old obf / opf / opff code (without needing newer modules).

The migration script is first used to assess the situation (how many
products would be moved, how many products have conflicts etc.) on all
flavors.

PR in infrastructure repo to detail the migration:
openfoodfacts/openfoodfacts-infrastructure#363

---------

Co-authored-by: hangy <[email protected]>
Co-authored-by: Alex Garel <[email protected]>
Co-authored-by: OFF <[email protected]>
  • Loading branch information
4 people authored Oct 8, 2024
1 parent 34ae5e4 commit 3c18781
Show file tree
Hide file tree
Showing 64 changed files with 1,430 additions and 326 deletions.
2 changes: 1 addition & 1 deletion cgi/product_multilingual.pl
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ ($product_ref)
display_error_and_exit($request_ref, lang("no_owner_defined"), 200);
}
$product_id = product_id_for_owner($Owner_id, $code);
$product_ref = retrieve_product_or_deleted_product($product_id, $User{moderator});
$product_ref = retrieve_product($product_id, $User{moderator});
if (not defined $product_ref) {
display_error_and_exit($request_ref, sprintf(lang("no_product_for_barcode"), $code), 404);
}
Expand Down
11 changes: 9 additions & 2 deletions conf/nginx/sites-available/obf
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ server {
index index.html index.htm index.nginx-debian.html;

location ~ ^/images/products/ {
# 2024/10/03 - temporary redirects as we changed the path of images
# for barcodes that are 8 digits or less
rewrite ^/images/products/(....)/([^/]*)$ /images/products/000/000/000/$1/$2 break;
rewrite ^/images/products/(.)(....)/([^/]*)$ /images/products/000/000/00$1/$2/$3 break;
rewrite ^/images/products/(..)(....)/([^/]*)$ /images/products/000/000/0$1/$2/$3 break;
rewrite ^/images/products/(...)(....)/([^/]*)$ /images/products/000/000/$1/$2/$3 break;
rewrite ^/images/products/(.)(...)(....)/([^/]*)$ /images/products/000/00$1/$2/$3/$4 break;
rewrite ^/images/products/(..)(...)(....)/([^/]*)$ /images/products/000/0$1/$2/$3/$4 break;

include snippets/off.cors-headers.include;
include snippets/expiry-headers.include;
add_header Link "<http://creativecommons.org/licenses/by-sa/3.0/>; rel='license'; title='CC-BY-SA 3.0'";
Expand All @@ -54,8 +63,6 @@ server {
gunzip on;
}

if ($http_referer ~* (jobothoniel.com) ) { return 403; } # blocked since 2021-07-13

# the app requests /1.json to get the product count...
# the commented code below is to serve a static copy
# if there is a spike of installs
Expand Down
11 changes: 9 additions & 2 deletions conf/nginx/sites-available/off
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,15 @@ server {
index index.html index.htm index.nginx-debian.html;

location ~ ^/images/products/ {
# 2024/10/03 - temporary redirects as we changed the path of images
# for barcodes that are 8 digits or less
rewrite ^/images/products/(....)/([^/]*)$ /images/products/000/000/000/$1/$2 break;
rewrite ^/images/products/(.)(....)/([^/]*)$ /images/products/000/000/00$1/$2/$3 break;
rewrite ^/images/products/(..)(....)/([^/]*)$ /images/products/000/000/0$1/$2/$3 break;
rewrite ^/images/products/(...)(....)/([^/]*)$ /images/products/000/000/$1/$2/$3 break;
rewrite ^/images/products/(.)(...)(....)/([^/]*)$ /images/products/000/00$1/$2/$3/$4 break;
rewrite ^/images/products/(..)(...)(....)/([^/]*)$ /images/products/000/0$1/$2/$3/$4 break;

include snippets/off.cors-headers.include;
include snippets/expiry-headers.include;
add_header Link "<http://creativecommons.org/licenses/by-sa/3.0/>; rel='license'; title='CC-BY-SA 3.0'";
Expand Down Expand Up @@ -76,8 +85,6 @@ server {
return 302 https://openfoodfacts-ds.s3.eu-west-3.amazonaws.com/fr.openfoodfacts.org.products.csv.gz;
}

if ($http_referer ~* (jobothoniel.com) ) { return 403; } # blocked since 2021-07-13

# the app requests /1.json to get the product count...
# the commented code below is to serve a static copy
# if there is a spike of installs
Expand Down
11 changes: 9 additions & 2 deletions conf/nginx/sites-available/opf
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,15 @@ server {
index index.html index.htm index.nginx-debian.html;

location ~ ^/images/products/ {
# 2024/10/03 - temporary redirects as we changed the path of images
# for barcodes that are 8 digits or less
rewrite ^/images/products/(....)/([^/]*)$ /images/products/000/000/000/$1/$2 break;
rewrite ^/images/products/(.)(....)/([^/]*)$ /images/products/000/000/00$1/$2/$3 break;
rewrite ^/images/products/(..)(....)/([^/]*)$ /images/products/000/000/0$1/$2/$3 break;
rewrite ^/images/products/(...)(....)/([^/]*)$ /images/products/000/000/$1/$2/$3 break;
rewrite ^/images/products/(.)(...)(....)/([^/]*)$ /images/products/000/00$1/$2/$3/$4 break;
rewrite ^/images/products/(..)(...)(....)/([^/]*)$ /images/products/000/0$1/$2/$3/$4 break;

include snippets/off.cors-headers.include;
include snippets/expiry-headers.include;
add_header Link "<http://creativecommons.org/licenses/by-sa/3.0/>; rel='license'; title='CC-BY-SA 3.0'";
Expand All @@ -53,8 +62,6 @@ server {
gunzip on;
}

if ($http_referer ~* (jobothoniel.com) ) { return 403; } # blocked since 2021-07-13

# the app requests /1.json to get the product count...
# the commented code below is to serve a static copy
# if there is a spike of installs
Expand Down
13 changes: 10 additions & 3 deletions conf/nginx/sites-available/opff
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,16 @@ server {

index index.html index.htm index.nginx-debian.html;

location ~ ^/images/petfood/ {
location ~ ^/images/products/ {
# 2024/10/03 - temporary redirects as we changed the path of images
# for barcodes that are 8 digits or less
rewrite ^/images/products/(....)/([^/]*)$ /images/products/000/000/000/$1/$2 break;
rewrite ^/images/products/(.)(....)/([^/]*)$ /images/products/000/000/00$1/$2/$3 break;
rewrite ^/images/products/(..)(....)/([^/]*)$ /images/products/000/000/0$1/$2/$3 break;
rewrite ^/images/products/(...)(....)/([^/]*)$ /images/products/000/000/$1/$2/$3 break;
rewrite ^/images/products/(.)(...)(....)/([^/]*)$ /images/products/000/00$1/$2/$3/$4 break;
rewrite ^/images/products/(..)(...)(....)/([^/]*)$ /images/products/000/0$1/$2/$3/$4 break;

include snippets/off.cors-headers.include;
include snippets/expiry-headers.include;
add_header Link "<http://creativecommons.org/licenses/by-sa/3.0/>; rel='license'; title='CC-BY-SA 3.0'";
Expand All @@ -53,8 +62,6 @@ server {
gunzip on;
}

if ($http_referer ~* (jobothoniel.com) ) { return 403; } # blocked since 2021-07-13

# the app requests /1.json to get the product count...
# the commented code below is to serve a static copy
# if there is a spike of installs
Expand Down
9 changes: 3 additions & 6 deletions docs/api/how-to-download-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,11 @@ In get you want to get an image which url is not directly present in product dat

### Computing single product image folder

Images of a product are stored in a single directory. The path of this
directory can be inferred easily from the product barcode.
There are two cases:
Images of a product are stored in a single directory. The path of this directory can be inferred easily from the product barcode:

1. If the product barcode is 8 digits long or shorter (ex: "22222222"), the directory path is
simply the barcode: `https://images.openfoodfacts.org/images/products/{barcode}`.
If the barcode is less than 13 digits long, it must be padded with leading 0s so that it has 13 digits.

2. Otherwise, split the first 9 digits of the barcode into 3 groups of 3 digits to get the first 3 folder names, and use the rest of the barcode as the last folder name^[split-regexp].
Then split the first 9 digits of the barcode into 3 groups of 3 digits to get the first 3 folder names, and use the rest of the barcode as the last folder name^[split-regexp].
For example, barcode `3435660768163` is split into: `343/566/076/8163`, thus product images will be in `https://images.openfoodfacts.org/images/products/343/566/076/8163`

^[split-regexp]: The following regex can be used to split the barcode into subfolders: `/^(...)(...)(...)(.*)$/`
Expand Down
34 changes: 34 additions & 0 deletions docs/api/ref-barcode-normalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Reference: Barcode Normalization

This reference describes how barcodes are normalized in Open Food Facts.

## The problem: barcodes may be prefixed by a varying number of 0s

Different types of barcodes can be found on products. The most common are:

* EAN-13 / GTIN-13: 13 digit barcode
* EAN-8: 8 digit barcode, short version of EAN-13 barcodes that have 5 leading 0s
* UPC-A / UPC-12: 12 digit barcode that were used mostly in the US and Canada. A leading 0 can be added to get the corresponding EAN-13.
* UPC-E: 7 digit barcode, short version of UPC-A
* EAN-14 / GTIN-14: used for non-consumer facing products (e.g. a case of individal products). a leading 0 can be added to EAN-13 to get the corresponding EAN-14.

The same code could be printed on products with a different number of leading 0s.
Additionally, some barcode scanners may add or remove leading 0s.

As the barcode is used as the key in Open Food Facts, we can end up with duplicate products that just differ by the number of leading 0s.

## The solution: barcode normalization

In Open Food Facts, we choose to fix the number of leading 0s in this way:

All barcodes with 7 digits or less (after leading 0s are removed) are padded with leading 0s so that they have 8 digits.

All barcodes with 9 to 12 digits are padded with leading 0s so that they have 13 digits.

The "code" field in the product database, database dumps and exports is normalized in this way.

### Normalization of barcodes in the API

The Open Food Facts API automatically normalize the barcode passed in the "code" field for both READ and WRITE requests.

So a request for the 12 digit barcode 034000470693 will return the product saved with "code" 0034000470693.
11 changes: 6 additions & 5 deletions lib/ProductOpener/Display.pm
Original file line number Diff line number Diff line change
Expand Up @@ -7897,12 +7897,13 @@ JS
}

# Old UPC-12 in url? Redirect to EAN-13 url
if ($request_code ne $code) {
# TODO - 2024/10/02 - Temporarily disabled so that we can migrate short barcodes with digits not equal to 8 or greater or equal to 13
# Reenable after all products are migrated.
if (0 and ($request_code ne $code)) {
$request_ref->{redirect} = $request_ref->{canon_url};
$log->debug(
"302 redirecting user because request_code does not match code",
{redirect => $request_ref->{redirect}, lc => $lc, request_code => $code}
) if $log->is_debug();
$log->debug("302 redirecting user because request_code does not match code",
{redirect => $request_ref->{redirect}, lc => $lc, code => $code, request_code => $request_code})
if $log->is_debug();
redirect_to_url($request_ref, 302, $request_ref->{redirect});
}

Expand Down
2 changes: 2 additions & 0 deletions lib/ProductOpener/Import.pm
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,7 @@ sub import_csv_file ($args_ref) {
# read code
my $code = $imported_product_ref->{code};
$code = normalize_code($code);
$imported_product_ref->{code} = $code; # In case we added or removed leading 0s

my $modified = 0;

Expand Down Expand Up @@ -2918,6 +2919,7 @@ sub update_export_status_for_csv_file ($args_ref) {

my $code = $imported_product_ref->{code};
$code = normalize_code($code);
$imported_product_ref->{code} = $code; # In case we added or removed leading 0s
my $product_id = product_id_for_owner($Owner_id, $code);

$log->debug("update export status for product", {i => $i, code => $code, product_id => $product_id})
Expand Down
Loading

0 comments on commit 3c18781

Please sign in to comment.