perf: cut down cpu time of aevaluate by 30% on 1-4MB examples with th… · langchain-ai/langsmith-sdk@2df6407

Commit

perf: cut down cpu time of aevaluate by 30% on 1-4MB examples with th…

…is one trick (#1217)

A huge percentage of our `aevaluate` calls were trying to serialize part
of the example to send in the evaluator trace (where we used to show the
first 10k characters). Let's just not do that.

## Experiment view before and after
<img width="1457" alt="Screenshot 2024-11-14 at 3 22 21 PM"
src="https://github.com/user-attachments/assets/5003ffe4-58ce-4e60-8854-a6616d2b8d36">
<img width="1366" alt="Screenshot 2024-11-14 at 3 23 06 PM"
src="https://github.com/user-attachments/assets/65fc0ef3-2c77-4f5e-b4df-ca543408a03e">


## flame graphs before and after
<img width="1509" alt="Screenshot 2024-11-14 at 2 43 36 PM"
src="https://github.com/user-attachments/assets/01eb0a2b-fb14-49ba-ae0c-90f5eb516fd9">

<img width="1510" alt="Screenshot 2024-11-14 at 3 32 50 PM"
src="https://github.com/user-attachments/assets/dea47939-ac99-4cdc-bd18-e698511f8bad">


## Benchmarking code
```
async def run_abenchmark(
    n_examples=200, 
    min_size=1000000, 
    max_size=4000000, 
    n_evaluators=5,
    min_llm_time=0.2,
    max_llm_time=1.2,
    n_concurrency=None
):

    # setup dataset
    inputs = [
        {"key": "a" * randint(min_size / 2, max_size / 2)}
        for _ in range(n_examples)
    ]
    outputs = [
        {"key": "b" * randint(min_size / 2, max_size / 2)}
        for _ in range(n_examples)
    ]

    if ls_client.has_dataset(dataset_name="jake_benchmarking"):
        ls_client.delete_dataset(dataset_name="jake_benchmarking")

    print("Creating dataset...")
    dataset = ls_client.create_dataset("jake_benchmarking")

    print("Uploading examples...")
    for i in range(0, n_examples, UPLOAD_BATCH_SIZE):
        ls_client.create_examples(
            dataset_id=dataset.id,
            inputs=inputs[i:i+UPLOAD_BATCH_SIZE],
            outputs=outputs[i:i+UPLOAD_BATCH_SIZE]
        )

    # setup evaluators
    evaluators = []
    for i in range(n_evaluators):
        evaluators.append(create_aevaluator(f"jake_benchmarking_{i}", uniform(min_llm_time, max_llm_time)))

    async def target(input):
        await asyncio.sleep(uniform(min_llm_time, max_llm_time))
        return {"value": "b" * len(input["key"])}

    print("Running evaluation...")
    await aevaluate(
        target,
        data=dataset.id,
        evaluators=evaluators,
        max_concurrency=n_concurrency,
        client=ls_client
    )
```

Loading branch information

jakerachleff authored Nov 15, 2024

1 parent d8adcde commit 2df6407

python/langsmith/schemas.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -124,6 +124,10 @@ def url(self) -> Optional[str]: @@
                 return f"{self._host_url}{path}"
             return None
+        def __repr__(self):
+            """Return a string representation of the RunBase object."""
+            return f"{self.__class__}(id={self.id}, dataset_id={self.dataset_id}, link='{self.url}')"
     class ExampleSearch(ExampleBase):
         """Example returned via search."""
@@ Expand Down @@

0 comments on commit `2df6407`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `2df6407`

Commit

There are no files selected for viewing

0 comments on commit 2df6407

0 comments on commit `2df6407`