Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of feat: allow "negative" text queries #19

Conversation

ramayer
Copy link
Contributor

@ramayer ramayer commented Sep 28, 2021

This is a draft of a patch for something similar to issue #18 -- but it was not implemented exactly according to the requirements described in that issue.

Instead of a single string containing both the positive and negative clauses, I think it would be cleaner if the additive and subtractive phrases used separate command line parameters, like:

rclip zebra --minus="black and white" --plus="red and blue"

More details are mentioned in a comment under #18 .

If you think this is a good direction, I could clean it up more (add examples to docs; and remove a no-longer-used method) and re-submit it.

@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch from f1c1aef to 8af4280 Compare September 28, 2021 08:13
@yurijmikhalevich yurijmikhalevich self-requested a review September 28, 2021 16:00
@yurijmikhalevich yurijmikhalevich linked an issue Sep 28, 2021 that may be closed by this pull request
@yurijmikhalevich
Copy link
Owner

Thank you! I like your suggestion, responded in the GHI: #18 (comment)

rclip/utils.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
rclip/main.py Outdated Show resolved Hide resolved
@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch 2 times, most recently from 373ee7b to 6d26173 Compare September 29, 2021 03:02
@ramayer
Copy link
Contributor Author

ramayer commented Sep 29, 2021

I updated the branch and rebased them to a single commit.

I think I incorporated most of the feedback, but pycodestyle was pretty picky about long lines, so a couple of the lines are wrapped uncomfortably. Let me know if there's more cleanup needed.

rclip/model.py Outdated Show resolved Hide resolved
rclip/utils.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
@yurijmikhalevich
Copy link
Owner

Looks great! Thank you 😄 Can you please address a few nits, change "phrase" to "query" we will ship.

@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Sep 29, 2021

Do you have the time to add some tests? It will be great to have one that checks that we don't break the search when introducing changes like this one. It will make sense to PR this test separately.

@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch from 1959d1f to af08bd8 Compare September 30, 2021 05:18
@ramayer
Copy link
Contributor Author

ramayer commented Sep 30, 2021

I'd be happy to add some tests if there was a template/framework for me to add to --- even if the initial template is just the equivalent of assert(1>0). If there already is such a framework in the project, sorry I didn't notice it. Googling it seems there are enough different ways of adding tests to python packages I'm not sure which you'd prefer. Also not sure if you'd want to add test images to the project; or download them from somewhere at test time.

I think I incorporated the recent feedback in the pull request; but I do agree with the idea of adding tests first before merging.

@yurijmikhalevich
Copy link
Owner

@ramayer, OK, let's leave tests aside for now. I'll take a look at them when I have time. It makes sense for me to store a dozen test images in git repo; just let's make sure that they are not heavy.

rclip/main.py Outdated Show resolved Hide resolved
rclip/main.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated Show resolved Hide resolved
@yurijmikhalevich
Copy link
Owner

Thank you! A few more comments regarding the computation of the features.

@ramayer
Copy link
Contributor Author

ramayer commented Oct 1, 2021

I agree with all your suggestions, and thanks for teaching me better numpy tricks. Too busy with work to get to them today; but I should have a cleaner pull request on the weekend.

@ramayer
Copy link
Contributor Author

ramayer commented Oct 1, 2021

It makes sense for me to store a dozen test images in git repo; just let's make sure that they are not heavy.

Clip's preprocess tranform seems to scale everything to 244x244 (at least at the settings rclip uses), it seems it should work with pretty light test images ....

..... but hmm..... that means CLIP is missing small details from high resolution pictures.... which makes me want to consider another possible feature request. Maybe I want to index multiple different CLIP vectors from different crops of my larger pictures......

@yurijmikhalevich
Copy link
Owner

Too busy with work to get to them today; but I should have a cleaner pull request on the weekend.

Sure. No rush. Thank you for doing this! :-)

@yurijmikhalevich
Copy link
Owner

CLIP is missing small details from high resolution pictures.... which makes me want to consider another possible feature request. Maybe I want to index multiple different CLIP vectors from different crops of my larger pictures......

You'll be surprised how well 244x244 works. It will be interesting to experiment with different resolutions or crops, but I suspect that 244x244 should be good enough for rclip. And it is much faster compared to using multiple crops or a higher resolution. Usually, researchers tend to go with the smallest resolution that produces decent results.

@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch 2 times, most recently from 66da726 to f79e2de Compare October 4, 2021 05:32
@ramayer
Copy link
Contributor Author

ramayer commented Oct 4, 2021

Updated with the most recent feedback. Now using np.add.reduce(...) instead of functools.reduce(...) and now passing lists to clip's encode_text.

rclip/model.py Outdated Show resolved Hide resolved
rclip/model.py Outdated
similarities = (text_features @ item_features.T).squeeze(0).tolist()
positive_features = self.compute_text_features(positive_queries)
negative_features = self.compute_text_features(negative_queries)
text_features = np.add.reduce(positive_features) - np.add.reduce(negative_features)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough for now 👍 , but we may have done something like: np.add.reduce(self.compute_text_features(all_queries) * [1,1,1,-1,-1]) to compute all of the features at once. The array of ones and negative ones should be pre-created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that - but I think the multiply is more expensive. If this times it correctly:

    import numpy as np
    import timeit
    
    npos = nneg = 1000
    pos = np.random.rand(npos,512)
    neg = np.random.rand(nneg,512)
    
    timeit.timeit(lambda: np.add.reduce(pos) - np.add.reduce(neg),number=1000)
    
    all = np.random.rand(npos + nneg,512)
    signs = np.array([1] * npos + [-1] * nneg)
    
    timeit.timeit(lambda: np.add.reduce(signs * all.T),number=1000)

I'm getting

    >>> timeit.timeit(lambda: np.add.reduce(signs * all.T),number=1000)
    2.642763003001164
    >>> timeit.timeit(lambda: np.add.reduce(pos) - np.add.reduce(neg),number=1000)
    0.7372207119988161

I also don't find the array of signs as easy to read (but maybe that's just me).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramayer, thank you for the benchmark! I agree that readibility suffers from the "sign array" approach. My main concern was with the performance difference of calling compute_text_features once vs. twice.

@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Oct 4, 2021

Hi! Thank you for the update. One nit.

@yurijmikhalevich
Copy link
Owner

Can you, please, also rebase on top of master?

@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch 2 times, most recently from 27b5e5d to 4874780 Compare October 4, 2021 09:28
@talpay
Copy link

talpay commented Oct 4, 2021

2 questions on the latest PR:

  1. Isn't add redundant as a CL parameter since it is the same as query? What content is supposed to go into add that is not in query (which is required)?

e.g. rclip "zebra black and white" -s "red and blue"
and rclip "zebra" -a "black and white" -s "red and blue"
are identical, right?

Wouldn't it make more sense to get rid of the redundant -a option to simplify things for users?

  1. Queries without subtract (e.g. rclip "query") are now throwing:
Traceback (most recent call last):
  File "main.py", line 58, in <module>
    main()
  File "main.py", line 22, in main
    result = rclip.search(args.query, current_directory, args.top, args.add, args.subtract)
  File "/rclip/search.py", line 131, in search
    sorted_similarities = self._model.compute_similarities_to_text(features, positive_queries, negative_queries)
  File "/rclip/model.py", line 46, in compute_similarities_to_text
    negative_features = self.compute_text_features(negative_queries)
  File "/rclip/model.py", line 35, in compute_text_features
    text_encoded = self._model.encode_text(clip.tokenize(text).to(self._device))
  File "/anaconda3/lib/python3.8/site-packages/clip/model.py", line 344, in encode_text
    x = self.transformer(x)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/clip/model.py", line 199, in forward
    return self.resblocks(x)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/clip/model.py", line 186, in forward
    x = x + self.attention(self.ln_1(x))
  File "/anaconda3/lib/python3.8/site-packages/clip/model.py", line 183, in attention
    return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 978, in forward
    return F.multi_head_attention_forward(
  File "/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 4265, in multi_head_attention_forward
    k = k.contiguous().view(-1, bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0, 64] because the unspecified dimension size -1 can be any value and is ambiguous

I think the problem is that the current code assumes that there is always a negative query. One solution would be something like:

 def compute_similarities_to_text(
            self, item_features: np.ndarray,
            positive_queries: List[str], negative_queries: List[str]) -> List[Tuple[float, int]]:

        positive_features = self.compute_text_features(positive_queries)
        if len(negative_queries) > 0:
            negative_features = self.compute_text_features(negative_queries)
            text_features = np.add.reduce(positive_features) - np.add.reduce(negative_features)
        else:
            text_features = np.add.reduce(positive_features)

@ramayer
Copy link
Contributor Author

ramayer commented Oct 4, 2021

  1. Good point. I think the "--add" option is unnecessary. I'm happy to submit a version without it if we prefer

  2. Thanks for catching that - fixing it now.

Rebased with: updates based on github code review feedback

Github issue yurijmikhalevich#18

Co-authored-by: Yurij Mikhalevich <[email protected]>
@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch from 4874780 to 33dd995 Compare October 4, 2021 10:14
@ramayer ramayer closed this Oct 4, 2021
@ramayer ramayer deleted the allow_negative_text_queries_using_command_line_arguments branch October 4, 2021 10:27
@ramayer
Copy link
Contributor Author

ramayer commented Oct 4, 2021

I think I fixed the bug.

I also created a branch without the explicit --add parameter here:
https://github.com/ramayer/rclip/tree/without_explicit_add_param
I don't have a strong opinion either way on that one.

I'm not sure how to re-open the pull request, though :( It seems it was closed when I was updating the branch.

@ramayer
Copy link
Contributor Author

ramayer commented Oct 4, 2021

attempt at reopening using this technique: https://gist.github.com/guille-moe/cd41fdbc8969b15428a50af2543a5cfa --- sorry about my trouble with github

@ramayer ramayer reopened this Oct 4, 2021
@ramayer ramayer force-pushed the allow_negative_text_queries_using_command_line_arguments branch from 0aee0f7 to 33dd995 Compare October 4, 2021 10:58
@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Oct 4, 2021

Hi @talpay, thank you for commenting.

This is where I was standing initially, but I changed my opinion recently.

I've checked and rclip "zebra black and white" and rclip "zebra" -a "black and white produce pretty similar, but different vectors.

More importantly, --add param will come in handy when we will start using images instead of the textual "query". Then -a will make perfect sense, for example: rclip ./daylight-forest-picture.jpg --add "night".

I believe that you will almost always be able to substitute --add with a properly phrased single query, but I think that we should leave it anyway for 1) future similar images lookup support 2) for flexibility. If you don't think that you need additions, don't use them, the change is non-breaking, and rclip will continue to support the "original" interface.

rclip/model.py Outdated Show resolved Hide resolved
@yurijmikhalevich yurijmikhalevich merged commit acdecbe into yurijmikhalevich:main Oct 4, 2021
@yurijmikhalevich
Copy link
Owner

yurijmikhalevich commented Oct 4, 2021

@ramayer, merged! Thank you! Great job! 😄

@talpay
Copy link

talpay commented Oct 4, 2021

@yurijmikhalevich Thanks for the reply (and the awesome project). I've had a closer look and you're right, they're quite different and it is actually a very important distinction that should be clearly communicated:

rclip "zebra black and white"
leads to embedding(zebra black and white)

rclip "zebra" -a "black and white":
leads to embedding(zebra) + embedding(black and white)

These two might still get similar results but if we shift one more word, it should become clear:

rclip "zebra black" -a "and white":
leads to embedding(zebra black) + embedding(and white)

The first 2 queries will also return grayscale ("black and white") images of zebras whereas the last query will most likely not because "black and white" is not part of the embeddings (you can test this on @ramayer 's web ui).

After checking the code, I agree that this syntax is better but I would suggest making this behavior clearer by adding some good examples to the README.md, e.g. like above but also showing how you can chain multiple subtractions together like this:

rclip "zebra" -a "black and white" -s "car" -s "animal":
leads to embedding(zebra) + embedding(black and white) - embedding(car) - embedding(animal)

Maybe also adding the fact that we can actually use quotation marks in queries e.g. 'this is a "quoted" query' since this functionality is quite important to CLIP and also discussed in the paper).

On a final note, I could (subjectively) still imagine a single-string-syntax like rclip '+(positive word) +URL -word -("quoted" word)' to be more intuitive but implementing this parser is a bit trickier (e.g., to ensure correct grouping, that "-" can still function as a hyphen, and probably other stuff I'm overlooking that might turn out to be a headache).

@yurijmikhalevich
Copy link
Owner

@talpay, thank you! I am happy to hear that you like rclip. And thanks for a more detailed explanation. This is exactly what I was talking about.

I'll definitely add examples to the README before releasing the v2. And I agree with your outlook on a single-string syntax. I was considering something like that too and came to a similar conclusion. I also think that separate arguments just look cleaner.

@ramayer
Copy link
Contributor Author

ramayer commented Oct 4, 2021

Here's a great example of the differences of some ways of processing phrases:

Also interestingly:

  • The phrase "horse fly" with quotation marks. CLIP seems to understand that the intent is not a literal horse fly.
  • The phrase 'horse fly' - it infers something different depending on the kind of quotation marks.

On a final note, I could (subjectively) still imagine a single-string-syntax like rclip '+(positive word) +URL -word -("quoted" word)' to be more intuitive but implementing this parser is a bit trickier (e.g., to ensure correct grouping, that "-" can still function as a hyphen, and probably other stuff I'm overlooking that might turn out to be a headache).

I'm kinda forced to do that on the web UI (unless I want a multi-field advanced-search form) - but it becomes hard to remember, and excruciatingly painful to communicate. I even started going down the path of a crazy math for embeddings; starting with +2.5(dragon) +1.5(castle) to give different weights to the different phrases; and started looking into other operators (rotate the vector for "dragon" 30% into the direction of the vector of "castle").

And as we're discussing -- so many symbols mean something to CLIP like the phrase ❤️🌭 that it becomes hard to express different strings you might want to express without a syntax that's really robust at quoting weird characters. That path leads to madness or json or lisp.

I'm even thinking of just switching to a JSON syntax for all my web UI's math operators (which I already do in places like this that I wasn't bothered to come up with a clean syntax to express).

@yurijmikhalevich
Copy link
Owner

@ramayer, these are great examples indeed. It's also interesting to see how CLIP "distinguishes" between different kinds of quotes. I suspect that it's an artifact/bias learned from a dataset that inconsistently used both types of quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: allow "negative" text queries
3 participants