Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMAGEDAM-223: TIFF images are visible and useable in BBC Images #55

Merged
merged 17 commits into from
Dec 8, 2020

Conversation

gribeiro
Copy link

@gribeiro gribeiro commented Nov 25, 2020

What does this change?

It changes how tiffs files are handled. Before this update, only the transformed JPEG was kept, now the system keeps the transformed JPEG for cropping and other optimization but also keeps the original TIFF file. The TIFF file information is kept on "originalSource". Only non supported files, i.e. not browser friendly, like TIFF, will have this field. The correct file size of TIFF files are now being shown.

Edit 26/11/2020: After the discussion below, it was understood what the original intent of the JPEG transformation, so the changes of "originalSource" were reverted and a tentative solution for guardian#3049 was added. The issues from guardian#3049 seems to be corrected.

Some of this was caused by a shadowing of variables on ImageUpload.scala:

createOptimisedFileFuture(uploadRequest, deps).flatMap(uploadRequest => {

Here the usage of uploadRequest twice resulted on the lost of the original TIFF file.


It adds a new filter for searching for specific file types (fileType). Now the user can select on the filter, "fileType" and put a
MIME or some alias (tiff, jpg, png).

It bumps the image_filter_buffer of the imgops nginx.conf file to 200M. This was needed becauses large TIFFs files are transformed into big JPEGs with the current setting of transformation(same resolution, quality set to 100).

How can success be measured?

  1. Upload a TIFF with metadata
  2. See if the same metadata appears on the screen
  3. Search for TIFF files using the +fileType
  4. Crop some TIFF files successfully
  5. Download TIFF with fullsize (default download button)
  6. Do the same tests with some large TIFFs (I have tested up to 150MB)
  7. Test if the hash files of the upload file and the result on Grid are the same

Screenshots (if applicable)

screen1

Who should look at this?

Tested?

  • locally
  • on TEST

@paperboyo
Copy link

Hello! Just a tiny comment about +mimeType. Could I suggest using +fileType instead as a more user-friendly and understandable name? (great to have this as a search!) Will the types values be offered as suggestions, too, similarly to eg.

image

Regarding

now the system keeps the transformed JPEG for cropping and other optimization but also keeps the original TIFF file

in our implementation, cropping always uses an original file (preferable, I think). Prerenders are used only as smaller intermediaries to send to imgOps (for display). They always had problems: we created them as PNG8 for all TIFFs irrespective if they contained transparency or not. This was never ideal (should be JPEGs for non-transparent imagery). But I cannot understand code, so will ask some colleagues around to explain to me how this PR changes things.

Great to see this! 🎉

@gribeiro gribeiro requested a review from wainaina November 25, 2020 12:16
@gribeiro gribeiro requested review from abdelrahmansd and removed request for wainaina November 26, 2020 14:27
@sihil
Copy link

sihil commented Nov 26, 2020

Hi @gribeiro - just to let you know that both @paperboyo and I have dug into the behaviour of the Grid. The expectation has always been that the TIFF would be preserved as the original file, and would indeed be supplied when downloaded.

Unfortunately we introduced an inadvertent change in April this year that broke this behaviour. We'll look to fix this ourselves in the coming days but suffice to say that there is no need to introduce a secondary location to store the original file.

@sihil
Copy link

sihil commented Nov 26, 2020

I must also add that the ability to search by file type is awesome 🎉

@gribeiro
Copy link
Author

Hi @gribeiro - just to let you know that both @paperboyo and I have dug into the behaviour of the Grid. The expectation has always been that the TIFF would be preserved as the original file, and would indeed be supplied when downloaded.

Unfortunately we introduced an inadvertent change in April this year that broke this behaviour. We'll look to fix this ourselves in the coming days but suffice to say that there is no need to introduce a secondary location to store the original file.

Hi @sihil, thank you for your input. I thought that was odd that the original was thrown out. I'm glad that it was not intended design. I agree that only one TIFF should be stored on the system. Having said that, having a setting for storing a intermediate representation could be useful for future formats that the conversion is (way) more expensive, so the cropping and other operations could benefit from the optimized storage, and this storage just happens for files types that we require on the configuration. So we can easily turn this off for TIFFs in the near future and use your solution.

@paperboyo
Copy link

storing a intermediate representation could be useful for future formats that the conversion is (way) more expensive, so the cropping and other operations could benefit from the optimized storage

Cropping should always be done from the original image IMHO, if only because creating an intermediary takes as much time as cropping anyway but only applies if you actually crop (most pictures are unused, so – uncropped; creating intermediaries for all of them would be costly).

There is already a concept of optimisedFile. It’s used mostly for display purposes where the imgOps request would otherwise take ages. Logic when and how to create it (and how it’s named) is a bit broken, but we could and should improve on it. For some notes, see here and further below there. We will possibly take a wee look at some of it too as part of the aforementioned fix 🤞.

@gribeiro
Copy link
Author

creating intermediaries for all of them would be costly).

Yes, creating for all would be costly, knowing that most are unused. But this is already on the codebase, it was not introduced by this PR, the feature introduced in this PR regarding the conversion is: when there are some images that are being converted to some intermediate format — for some reason that I’m not aware of — then store the original. I have to dig the past commits to see why this was conversion to tiff->jpeg implemented in the first place, but it was not the current intended scope to remove this conversion, but rather have also access to the original file.

Now, knowing that a fix is coming we can expand the scope on a future task and remove this past tiff->jpeg conversion.

Thank you for all the enlightenment 😅

@sihil
Copy link

sihil commented Nov 26, 2020

@gribeiro - The intention is that at ingest time we: store the original unmodified file, a thumbnail to facilitate browsing and an optimised file (where necessary, e.g. TIFFs) for improving performance for cropping in the display (note that the crop is still generated from the original TIFF, but the user sees the proxy).

We've agreed to prioritise fixing this and will follow up here when we have PRs. I expect that we will have an upstream fix (and some improved tests) for this by the end of next week.

@gribeiro
Copy link
Author

Hey @paperboyo, @sihil

Thank you guys for all the explaining on the intended behavior of this process. Now I can see more clearly.
I changed the PR with a solution for guardian#3049 and removed our previous solution.

@wainaina wainaina self-requested a review November 27, 2020 10:39
Copy link

@wainaina wainaina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
Awaiting feedback from the Guardian team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants