Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Multi-needle in a haystack #41

Open
jsharf opened this issue Mar 27, 2024 · 2 comments
Open

[Feature Proposal] Multi-needle in a haystack #41

jsharf opened this issue Mar 27, 2024 · 2 comments

Comments

@jsharf
Copy link

jsharf commented Mar 27, 2024

I really like this kind of benchmark. It would be interesting to make generalized versions of this, where there are a variable number of needles inserted. These could be unrelated independent needles, or they could be related. For example you could imagine 4 needles:

A implies B
B implies C, D
D implies E.
B is true

Then you could test the "related" needles, to ensure that all of them were detected and the relationship is understood. (What might A be? What about D?)

Curious what you think about this. If you're interested in a feature like this and willing to accept a pull request, I could find the time to try implementing it. If you have a style guide preference or anything like that, please let me know.

@gkamradt
Copy link
Owner

Hey! Awesome post and request

Couple things:

  1. I totally agree that reasoning should be a part of the next set of tests. As an aside, I've been wondering what the "unit test" of reasoning is - what is the minimal amount of reasoning we can start with? It may be the transitive reasoning you're referring to here. I like this because you can easily append additional chains, and even put forks in the logic.
  2. Lance from LangChain added multi-needle recall, but it didn't have reasoning in there.

We are trying to have the repo separate tests from providers from evaluators, other than that. No style guide.

Contributions are very welcome and we'll be quick with feedback

@gkamradt
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants