Query replay #45

spirali · 2023-07-31T12:54:41Z

Simple query caching

Fixes #40

spirali · 2023-07-31T12:55:17Z

@jas-ho @gavento Any opinion on this?

gavento · 2023-07-31T14:02:21Z

How about making this part of the language model (engine), e.g. as a wrapper?
Pros:

Can work in any context, not just query_engine and other special methods.
Cons:
Needs to be done separately for each LLM.

Also, it would be nice to have a verification that only cached queries are made, and in the right order.

spirali · 2023-07-31T14:17:56Z

Checks that query match are already there.

I wanted to create a ReplayModel but it has some caveats:

We do not have standardized "config" across the models so it is not easy to directly pass the underlying configuration, without it you cannot use replayed queries again in replay. I think this is important when you create a replay, then use a replay and create some new queries and then use this new context again for replay.
You have to use query_model anyway because otherwise the query is not stored in context

jas-ho · 2023-08-02T08:04:50Z

do I understand correctly that

the context from which responses are taken in the replay (replay = Replay(context1)) and
the context in which I use the replay (with context2: query_model(..., replay=replay) )

are two independent things @spirali?

jas-ho · 2023-08-02T08:08:28Z

interlab/lang_models/query_model.py

+    prompt: str | FormatStr,
+    kwargs: dict = None,
+    with_context=True,
+    replay: Replay = None,


typing should be Optional[Replay] or Replay | None

jas-ho · 2023-08-02T08:14:07Z

interlab/lang_models/replay.py

+class Replay:
+    def __init__(self, context: Context):
+        self.replays = {}
+        for context in context.find_contexts(lambda ctx: ctx.kind == "query"):


might be good to extract string constants like "query", "conf", "prompt" etc into a constants module. it can reduce the risk of bugs from typos. it can also signal to future devs that these are keys which more than one module relies on

jas-ho · 2023-08-02T08:18:21Z

interlab/lang_models/replay.py

+            self.replays.setdefault(key, [])
+            self.replays[key].append(context.result)
+        for replay in self.replays.values():
+            replay.reverse()


the logic here assumes that future implementations of StorageBase.list adhere to a certain order of listing contexts. This should be specified in the doc string of StorageBase.list and tested in a unit test (if not there already)

jas-ho · 2023-08-02T08:20:51Z

tests/lang_models/test_replay.py

+
+
+@patch("interlab.lang_models.openai._make_openai_chat_query")
+def test_replay_model(_make_openai_chat_query):


jas-ho · 2023-08-02T08:30:30Z

returning to this point by @gavento

Can work in any context, not just query_engine and other special methods.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize
...and this will likely expand in the future.

What are your thoughts about making replay easily available consistently across all methods which rely on LLM calls @spirali?

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

gavento · 2023-08-02T13:54:17Z

Major thought why not: separation of concerns.

query_model is now doing one thing: calling a variety of models with unified API and context logging. (The plan is to also get e.g. token limit to work as a parameter here, which is not trivial with LC models).

I am against adding more functionality to it - I think it is bad design, and the functionality does not seem core to me (or, at least, I am not convinced it is the right way to do it), and bloating one function for it seems bad design for any future.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize
...and this will likely expand in the future.

Prevalence of query_model should IMO be irrelevant here - currently you can also call models directly (for any reason at all, including extra functionality of the call etc) and nothing will break (except for a missing log). With adding this to query_model, that becomes a fragile bottle neck.

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

So how about adding it as a wrapper around LLMs individually?

Easy to understand what is going on. Composable, and an independent feature (that we can even deprecate -- good point of every non-core design bit, and even there).
You can still feed it a Context object as a source of the logs
Tweakable (e.g. having slightly different behavior for different actors etc.)

spirali · 2023-08-08T11:20:26Z

I am rewriting replay functionality from scratch. I want to make this functionality completely generic (any function / method call can be replayed, not necessarily only LLMs). I have some PoC but I would like to sanity check of the idea before finishing it. Here is how it should look at the end:

def my_function(x, y):
   return x + y

replay = Replay("my-replay-stream-name")

# Storing replay

with Context("root") as root:
   replay(my_function)(1, 2)
   
   
# Using replay

replay = Replay("my-replay-stream-name", context=root)
with Context("root") as root:
   replay(my_function)(2, 3)  # Call my_function as it is not in cache
   replay(my_function)(1, 2)  # Recalls from cache
   replay(my_function)(1, 2)  # Call my_function as it is not in cache (2nd call)
   

# Wrapping any object, in replay log is stored also "self"

from langchain.llms import OpenAI

llm = replay(OpenAI(...))
llm.generate(["Tell me a joke."])


# Partial matching

replay(my_function, ignore_args=["y"])(2, 3)  # Ignores argument "y" when trying to find matching call in cache

Any comments?

gavento · 2023-08-08T17:23:06Z

Thanks!

Three comments:

Why as a decorator/wrapper function? This way all code needs to be wrapped in replay, including internal querying, so either we use it everywhere, or you pass the wrapped function instead of a LLModel (but then you lose any LLM information)

Why not just have it as LangChain model wrapper class? (subclass of LLMBase that does pass-through on most attributes and non-query methods, maybe with some modifications), or or model class?

Storage logic: How does the storage/retrieval in contexts work?

the demo seems a bit magic - maybe have explicit call to store vs retrieve it? E.g. I assume your code will silently load an empty replay if I mistype the replay name now.
How are replays stored in non-root contexts? Are they also present in sub-contexts? etc.
I assume that usually there will be only one replay in a context, so make the name optional?
- (or wild idea: have a leaf context for each replay, and thus show it in the UI tree?)

Cache-misses and hits (esp. with repeated calls with same values) seem possibly inconsistent or illogical now. Way I think about it as a user

Either it offers "strict" replay, and then cache-miss before cache-hit should be an error, indicating non-deterministic replayed function (that should not fail silently!) (This is the primary way to present this, IMO)
Or it offers "just caching" to the user and allows e.g. mixing cache-hits with cache-misses, but then repeated queries for the same value seem problematic anyway (what if I do an extra call during replay that would be uncached normally, but it coincides with some cached value that gets called multliple times? etc.)

gavento · 2023-08-08T17:24:01Z

I also want to pitch games/situations with storable state (pickle is fine) and meaningful steps where the state can be stored/restored ;-)

spirali · 2023-08-08T19:13:43Z

No, you just pass the instance. Look at my example, I call there .generate() on the wrapped instance without any additional wrapping, it automatically pass through arguments and methods. So you just wrap the instance when it is created and pass the instance in other cases. We can add some LangChainWrapper, I am not against it, but it will be just oneliner.
Right now, it always records calls. When you pass context to replay constructor, it uses stored info when necessary. It wanted to create it like this to avoid changed code when adding new code. I am still bit experimenting when it is actually stored.
I agree that we should offer a possibility to set a policy what to do with cache miss. What should be default needs to be discussed

Query replay

4f0760d

jas-ho reviewed Aug 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query replay #45

Query replay #45

spirali commented Jul 31, 2023

spirali commented Jul 31, 2023

gavento commented Jul 31, 2023

spirali commented Jul 31, 2023 •

edited

Loading

jas-ho commented Aug 2, 2023

jas-ho Aug 2, 2023

jas-ho Aug 2, 2023

jas-ho Aug 2, 2023

jas-ho Aug 2, 2023

jas-ho commented Aug 2, 2023

gavento commented Aug 2, 2023

spirali commented Aug 8, 2023

gavento commented Aug 8, 2023

gavento commented Aug 8, 2023

spirali commented Aug 8, 2023



		@patch("interlab.lang_models.openai._make_openai_chat_query")
		def test_replay_model(_make_openai_chat_query):

Query replay #45

Are you sure you want to change the base?

Query replay #45

Conversation

spirali commented Jul 31, 2023

spirali commented Jul 31, 2023

gavento commented Jul 31, 2023

spirali commented Jul 31, 2023 • edited Loading

jas-ho commented Aug 2, 2023

jas-ho Aug 2, 2023

Choose a reason for hiding this comment

jas-ho Aug 2, 2023

Choose a reason for hiding this comment

jas-ho Aug 2, 2023

Choose a reason for hiding this comment

jas-ho Aug 2, 2023

Choose a reason for hiding this comment

jas-ho commented Aug 2, 2023

gavento commented Aug 2, 2023

spirali commented Aug 8, 2023

gavento commented Aug 8, 2023

gavento commented Aug 8, 2023

spirali commented Aug 8, 2023

spirali commented Jul 31, 2023 •

edited

Loading