Improve Key-Value Caching #383

UFO-101 · 2023-09-18T02:24:01Z

Description

Commit 1:

Support passing in multiple tokens when using past_kv_cache.
Add tests for past_kv_cache.
Add documentation for past_kv_cache.
Fix type hints for some components that assume left_attention_mask has same number of tokens as input. This was previously unnoticed because there were no tests that covered past_kv_cache.

Commit 2:

Support freezing key-value caches.

Commit 3:

Integrate past_left_attention_mask into HookedTransformerKeyValueCache so that it doesn't need to managed manually. Remove from HookedTransformer.forward().

Motivation for allowing multiple tokens to run with key value cache

In ACDC we run the same prompt many times. Patching only affects token positions after the point where the clean and corrupt prompts differ. We want to run the first part of the prompt that is identical between clean and corrupt, freeze the cache, then pass in only the tokens after the point of divergence for our patched runs.

Breaking change

Commit 3 removes past_left_attention_mask from HookedTransformer.forward() because the left_attention_mask of previous inputs is stored automatically by HookedTransformerKeyValueCache. I think this won't break many people's code as this argument was only added 2 weeks ago in #344. And the fix should be quite trivial as it only requires deleting this input. (See changes to tests in commit 3 for an example).

Overall I think the benefits are worth the cost as this makes caching easier to do and generally reduces complexity. I can't think of any case where someone would want to pass in a past_left_attention_mask that doesn't match past_kv_cache.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

- Add tests for past_kv_cache. - Add documentation for past_kv_cache. - Fix type hints for some components that assume left_attention_mask has same number of tokens as input. This was previously unnoticed because there were no tests that covered past_kv_cache.

…e so that it doesn't need to managed manually. Remove from HookedTransformer forward().

UFO-101 · 2023-09-19T00:04:33Z

Sorry actually this isn't ready. I just realized there's an edge case when passing in a left padded input while using the cache.

UFO-101 · 2023-09-26T17:31:30Z

Superseded by #386

UFO-101 force-pushed the main branch 3 times, most recently from 0844e2f to 94b512b Compare September 18, 2023 20:36

UFO-101 added 2 commits September 18, 2023 22:03

Support freezing key-value caches.

033393e

UFO-101 force-pushed the main branch from 94b512b to 033393e Compare September 18, 2023 21:05

UFO-101 changed the title ~~Support running multiple tokens with KV cache. Support freezing KV cache.~~ Improve Key-Value Caching Sep 18, 2023

Integrate past_left_attention_mask into HookedTransformerKeyValueCach…

5334792

…e so that it doesn't need to managed manually. Remove from HookedTransformer forward().

UFO-101 force-pushed the main branch from a97c3c1 to 5334792 Compare September 18, 2023 23:18

UFO-101 closed this Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Key-Value Caching #383

Improve Key-Value Caching #383

UFO-101 commented Sep 18, 2023 •

edited

Loading

UFO-101 commented Sep 19, 2023

UFO-101 commented Sep 26, 2023

Improve Key-Value Caching #383

Improve Key-Value Caching #383

Conversation

UFO-101 commented Sep 18, 2023 • edited Loading

Description

Motivation for allowing multiple tokens to run with key value cache

Breaking change

Type of change

Checklist:

UFO-101 commented Sep 19, 2023

UFO-101 commented Sep 26, 2023

UFO-101 commented Sep 18, 2023 •

edited

Loading