-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rudimentary ICC color profile support to colorhash #110
base: master
Are you sure you want to change the base?
Conversation
Color profiles can significantly alter the appearance of an image's color. This change modifies the colorhash algorithm to take that into account by normalizing any embedded color profiles into the standard sRGB color space before analysis. It also provides an option to ignore embedded color profiles and instead analyze the image's raw color data (i.e. the default behaviour before this change), allowing images to be compared without their profiles.
This seems like a good idea, but it is currently failing in the CI (see the logs there). Do these color profiles also alter the luminescence? If yes, they could be relevant for the other hashes as well. |
Actually, if I read correctly, these ICC profiles are meant to adjust to the particular viewing device. Wouldn't it be more objective to ignore them? Probably I am misunderstanding this. |
They can have an effect on luminosity, yes, since they can essentially map any internal colour value to any other colour. Which means you could use a color profile to completely invert an image's colors, if you so desired. There are a few kinds of profile, not just display profiles. Display profiles do describe what colors a display device is capable of displaying, but usually aren't embedded in files because they're specific to a given piece of hardware. There are also input profiles, however, which are usually the ones you'd find embedded (such as by a scanner), and which describe precisely how the color values stored in an image file map to a standard device-independent color space, such as L*a*b*. This is the profile we care about, because it tells us how to correctly interpret the stored values. When coupled with a display profile, such as the one that comes with your OS or monitor, both profiles are used to accurately map colour from the input profile to the display profile using a device-independent color space as a connection space: Input --> L*a*b* --> Display It's even more complicated than all that gibberish above, but all we need to care about is that it works, and it's built-in to most image rendering software, so if we compare two images that are identical except one has an accurate input profile embedded, and the other doesn't, they're going to look different perceptually though they share the same underlying RGB values. I hope that made some sense. |
The CI error appears to be due to a dependency missing from Pillow's conda package. Pillow needs to be built with LittleCMS2 to support its ImageCms module. I developed using pip, and PyPI's Pillow package seems to come with everything included. Comparing the two packages (conda, pip), I can see it missing from conda. I'm not familiar with conda, but one of the possible solutions seems somewhat involved: https://stackoverflow.com/questions/52741538/anaconda3-python-3-6-pillow-cannot-import-imagecms |
The relevant conda package issue is here. |
Two suggestions:
|
I went with not allowing ImageCms to be optional due to the impact color profiles can have on an image's appearance, and I believe you were right earlier in thinking that they could be relevant to the other hash functions, so the code should probably be refactored to share this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, I am impressed you got travis working with a single commit. It usually takes me a few.
Zooming out a bit, if we want this to be a feature for all hash functions, perhaps it should be applied outside this functions. It appears the correction can be applied such that the image object is modified.
Then perhaps we should instead of including this in imagehash, document for users the optimal way to pass an PIL image to imagehash. This should include how to correct for the icc profile, but also applying:
from PIL import ImageOps
image = ImageOps.exif_transpose(image)
Do you think this would be the right approach?
@@ -38,12 +38,17 @@ install: | |||
- conda install --quiet --file conda-requirements.txt | |||
- pip install coveralls | |||
|
|||
# Work around conda's Pillow package lacking ImageCms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in line 9 add:
env:
- REINSTALL_PILLOW=no
- REINSTALL_PILLOW=yes
@@ -38,12 +38,17 @@ install: | |||
- conda install --quiet --file conda-requirements.txt | |||
- pip install coveralls | |||
|
|||
# Work around conda's Pillow package lacking ImageCms | |||
- conda uninstall pillow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then here prefix the two commands here with [[ "$REINSTALL_PILLOW" == "yes" ]] &&
@@ -320,7 +321,7 @@ def whash(image, hash_size = 8, image_scale = None, mode = 'haar', remove_max_ha | |||
|
|||
|
|||
|
|||
def colorhash(image, binbits=3): | |||
def colorhash(image, binbits=3, ignore_icc=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a positive variable name here to avoid the double negative. For example: apply_icc_profile=True
Yeah, I agree. Any necessary profile conversion should happen on the Image object before it being passed to any of the hash functions. As for where that's done: ICC profiles aren't particularly simple to handle, and there are a number of ways to screw up the conversion between profiles if one is not familiar with the concepts or specifics, such as the pros and cons between different rendering intents and color spaces. For instance, I chose to normalize on the sRGB color space here because it's the only one supported by PIL well enough to allow conversion into HSV, as the current colorhash algorithm requires. It's also usually the default color space in which software renders images without profiles, so it should be generally analogous to what most users would see on their screens... but ideally both images with and without color profiles would first be converted to one of the larger, human vision-based color spaces, such as LAB, before hashing, to better account for colors that lie outside the sRGB space. Since such decisions require a fair bit of knowledge, and are, currently at least, fairly tightly coupled to the hashing implementations, I think we should keep profile conversion internal to imagehash. More generally, I think we should abstract away the need for users to import PIL at all, allowing them to pass in file-type objects or file paths instead. The use of PIL internally seems more like an implementation detail to me, which we needn't bother the user with. |
For this PR, do I also need to update |
I agree with what you say. I think there is a benefit though of passing PIL objects however, as one can apply additional image modifications beforehand. If I understand you correctly, we would need one function that prepares an image for 'L' mode use, and one that prepare it for 'HSV' mode use. These can be used within the hash functions, just like the current color space transformations. |
If you use them only for testing, no, if you want them to be installed in the site-package folder, yes. I think no. |
There's little harm in supporting all three, I suppose :)
Since the current colorhash algorithm converts images into both "L" and "HSV", Pillow seems to support conversion from RGB to either of them just fine, so we should be able to use a single function for the current implementation. It would only get more complicated if we wanted to go from LAB to certain other modes. |
I would be happy to include this as a single preprocessing helper function. A section in the README should explain why and how to use it. This will make it easy for users to plug it in. Then we can also go with dependencies being optional, as it is not a core functionality. |
Color profiles can significantly alter the appearance of an image's underlying color values on modern display equipment. This change modifies the colorhash algorithm to take that into account by normalizing (and converting) any embedded color profiles into the standard sRGB color space prior to hashing. This allows colorhash to detect color differences between images which may only differ by the inclusion, absence, or change of a color profile.
It also provides an option to ignore embedded color profiles and instead only analyze an image's raw color data (i.e. the default behaviour before this change), allowing images to be compared with or without their profiles.