IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

gribeiro · 2020-11-25T06:29:55Z

What does this change?

It changes how tiffs files are handled. Before this update, only the transformed JPEG was kept, now the system keeps the transformed JPEG for cropping and other optimization but also keeps the original TIFF file. The TIFF file information is kept on "originalSource". Only non supported files, i.e. not browser friendly, like TIFF, will have this field. The correct file size of TIFF files are now being shown.

Edit 26/11/2020: After the discussion below, it was understood what the original intent of the JPEG transformation, so the changes of "originalSource" were reverted and a tentative solution for guardian#3049 was added. The issues from guardian#3049 seems to be corrected.

Some of this was caused by a shadowing of variables on ImageUpload.scala:

createOptimisedFileFuture(uploadRequest, deps).flatMap(uploadRequest => {

Here the usage of uploadRequest twice resulted on the lost of the original TIFF file.

It adds a new filter for searching for specific file types (fileType). Now the user can select on the filter, "fileType" and put a
MIME or some alias (tiff, jpg, png).

It bumps the image_filter_buffer of the imgops nginx.conf file to 200M. This was needed becauses large TIFFs files are transformed into big JPEGs with the current setting of transformation(same resolution, quality set to 100).

How can success be measured?

Upload a TIFF with metadata
See if the same metadata appears on the screen
Search for TIFF files using the +fileType
Crop some TIFF files successfully
Download TIFF with fullsize (default download button)
Do the same tests with some large TIFFs (I have tested up to 150MB)
Test if the hash files of the upload file and the result on Grid are the same

Screenshots (if applicable)

Who should look at this?

Tested?

locally
on TEST

…her types)

…nctional

paperboyo · 2020-11-25T09:47:52Z

Hello! Just a tiny comment about +mimeType. Could I suggest using +fileType instead as a more user-friendly and understandable name? (great to have this as a search!) Will the types values be offered as suggestions, too, similarly to eg.

Regarding

now the system keeps the transformed JPEG for cropping and other optimization but also keeps the original TIFF file

in our implementation, cropping always uses an original file (preferable, I think). Prerenders are used only as smaller intermediaries to send to imgOps (for display). They always had problems: we created them as PNG8 for all TIFFs irrespective if they contained transparency or not. This was never ideal (should be JPEGs for non-transparent imagery). But I cannot understand code, so will ask some colleagues around to explain to me how this PR changes things.

Great to see this! 🎉

sihil · 2020-11-26T15:08:36Z

Hi @gribeiro - just to let you know that both @paperboyo and I have dug into the behaviour of the Grid. The expectation has always been that the TIFF would be preserved as the original file, and would indeed be supplied when downloaded.

Unfortunately we introduced an inadvertent change in April this year that broke this behaviour. We'll look to fix this ourselves in the coming days but suffice to say that there is no need to introduce a secondary location to store the original file.

sihil · 2020-11-26T15:43:11Z

I must also add that the ability to search by file type is awesome 🎉

gribeiro · 2020-11-26T15:59:08Z

Hi @gribeiro - just to let you know that both @paperboyo and I have dug into the behaviour of the Grid. The expectation has always been that the TIFF would be preserved as the original file, and would indeed be supplied when downloaded.

Unfortunately we introduced an inadvertent change in April this year that broke this behaviour. We'll look to fix this ourselves in the coming days but suffice to say that there is no need to introduce a secondary location to store the original file.

Hi @sihil, thank you for your input. I thought that was odd that the original was thrown out. I'm glad that it was not intended design. I agree that only one TIFF should be stored on the system. Having said that, having a setting for storing a intermediate representation could be useful for future formats that the conversion is (way) more expensive, so the cropping and other operations could benefit from the optimized storage, and this storage just happens for files types that we require on the configuration. So we can easily turn this off for TIFFs in the near future and use your solution.

paperboyo · 2020-11-26T16:53:22Z

storing a intermediate representation could be useful for future formats that the conversion is (way) more expensive, so the cropping and other operations could benefit from the optimized storage

Cropping should always be done from the original image IMHO, if only because creating an intermediary takes as much time as cropping anyway but only applies if you actually crop (most pictures are unused, so – uncropped; creating intermediaries for all of them would be costly).

There is already a concept of optimisedFile. It’s used mostly for display purposes where the imgOps request would otherwise take ages. Logic when and how to create it (and how it’s named) is a bit broken, but we could and should improve on it. For some notes, see here and further below there. We will possibly take a wee look at some of it too as part of the aforementioned fix 🤞.

gribeiro · 2020-11-26T17:44:19Z

creating intermediaries for all of them would be costly).

Yes, creating for all would be costly, knowing that most are unused. But this is already on the codebase, it was not introduced by this PR, the feature introduced in this PR regarding the conversion is: when there are some images that are being converted to some intermediate format — for some reason that I’m not aware of — then store the original. I have to dig the past commits to see why this was conversion to tiff->jpeg implemented in the first place, but it was not the current intended scope to remove this conversion, but rather have also access to the original file.

Now, knowing that a fix is coming we can expand the scope on a future task and remove this past tiff->jpeg conversion.

Thank you for all the enlightenment 😅

sihil · 2020-11-26T19:11:48Z

@gribeiro - The intention is that at ingest time we: store the original unmodified file, a thumbnail to facilitate browsing and an optimised file (where necessary, e.g. TIFFs) for improving performance for cropping in the display (note that the crop is still generated from the original TIFF, but the user sees the proxy).

We've agreed to prioritise fixing this and will follow up here when we have PRs. I expect that we will have an upstream fix (and some improved tests) for this by the end of next week.

This reverts commit d59bcf1.

This reverts commit 82d94b5.

This reverts commit a8441c8.

gribeiro · 2020-11-27T02:41:55Z

Hey @paperboyo, @sihil

Thank you guys for all the explaining on the intended behavior of this process. Now I can see more clearly.
I changed the PR with a solution for guardian#3049 and removed our previous solution.

wainaina

Approved.
Awaiting feedback from the Guardian team.

This reverts commit ad8e76b.

gribeiro added 7 commits November 19, 2020 15:34

Removing development println

1c74ba4

Display correct original mime type

82d94b5

Store both jpg (for png optimize) and tiff.

d59bcf1

Implementation of search by mimeType for better finding tiffs (and ot…

8220084

…her types)

Merge remote-tracking branch 'origin/bbc2' into feature/tiff-fully-fu…

2a1447f

…nctional

image_filter_buffer bump for displaying >100MB tiffs on DEV

411e699

display correct size of original (TIFF) file

a8441c8

gribeiro requested a review from wainaina November 25, 2020 12:16

gribeiro added 2 commits November 26, 2020 01:29

Change mimeType alias to fileType. mimeType search bug fix.

90e6068

Change mimeType to fileType in frontend

9deb6e1

gribeiro requested review from abdelrahmansd and removed request for wainaina November 26, 2020 14:27

sihil mentioned this pull request Nov 26, 2020

Grid TIFF support is fundamentally broken and dangerous guardian/grid#3049

Closed

gribeiro added 4 commits November 26, 2020 21:47

Revert "Store both jpg (for png optimize) and tiff."

4b79b6e

This reverts commit d59bcf1.

Revert "Display correct original mime type"

534537d

This reverts commit 82d94b5.

Revert "display correct size of original (TIFF) file"

1e885d6

This reverts commit a8441c8.

Fixing the TIFF issue. Related to : guardian#3049

ad8e76b

croppedBy was mistankenly removed

f376691

wainaina self-requested a review November 27, 2020 10:39

wainaina reviewed Nov 27, 2020

View reviewed changes

gribeiro added 2 commits December 8, 2020 13:55

Revert "Fixing the TIFF issue. Related to : guardian#3049"

b0cf626

This reverts commit ad8e76b.

Merge branch 'bbc2' into feature/tiff-fully-functional

19ea0c2

Fixing conflicts

ad41372

gribeiro merged commit 736675d into bbc2 Dec 8, 2020

gribeiro mentioned this pull request Jan 14, 2021

[BBC] Introducing search by file type (tiff, png, jpeg) guardian/grid#3095

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

gribeiro commented Nov 25, 2020 •

edited

Loading

paperboyo commented Nov 25, 2020

sihil commented Nov 26, 2020

sihil commented Nov 26, 2020 •

edited

Loading

gribeiro commented Nov 26, 2020

paperboyo commented Nov 26, 2020

gribeiro commented Nov 26, 2020

sihil commented Nov 26, 2020 •

edited

Loading

gribeiro commented Nov 27, 2020

wainaina left a comment

IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

Conversation

gribeiro commented Nov 25, 2020 • edited Loading

What does this change?

How can success be measured?

Screenshots (if applicable)

Who should look at this?

Tested?

paperboyo commented Nov 25, 2020

sihil commented Nov 26, 2020

sihil commented Nov 26, 2020 • edited Loading

gribeiro commented Nov 26, 2020

paperboyo commented Nov 26, 2020

gribeiro commented Nov 26, 2020

sihil commented Nov 26, 2020 • edited Loading

gribeiro commented Nov 27, 2020

wainaina left a comment

Choose a reason for hiding this comment

gribeiro commented Nov 25, 2020 •

edited

Loading

sihil commented Nov 26, 2020 •

edited

Loading

sihil commented Nov 26, 2020 •

edited

Loading