-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a skill to check if information is enough to answer a yes-or-no question #732
base: main
Are you sure you want to change the base?
Conversation
Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉 I support the following commands:
Note Results or Errors of these commands will be posted as a pull request check in the Checks section below Note Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands. |
Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉 I support the following commands:
Note Results or Errors of these commands will be posted as a pull request check in the Checks section below Note Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands. |
@instructlab-bot precheck |
Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 189. The results will be presented below in the pull request status box. This may take several minutes... |
Results for job ID: 189 using the model merlinite-7b! Results can be found here. |
cc3c751
to
3de729b
Compare
@mingxzhao Here is the |
@instructlab-bot precheck |
Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 257. The results will be presented below in the pull request status box. This may take several minutes... |
Results for job ID: 257 using the model merlinite-7b! Results can be found here. |
Thank you for the ping I will take a look. For future reference the ilab train is only a quantized version, so I do typically need to run against the full model to confirm, but thank you for running the ilab as a pre check! |
@instructlab-bot precheck |
Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 259. The results will be presented below in the pull request status box. This may take several minutes... |
Results for job ID: 259 using the model merlinite-7b! Results can be found here. |
@instructlab-bot generate |
Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 263. The results will be presented below in the pull request status box. This may take several minutes... |
Results for job ID: 263 using the model sdg service backend! Results can be found here. |
Everything looks good, marking as approved, thank you for your patience! |
Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉 I support the following commands:
Note Results or Errors of these commands will be posted as a pull request check in the Checks section below Note Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands. |
Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉 I support the following commands:
Note Results or Errors of these commands will be posted as a pull request check in the Checks section below Note Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands. |
@instructlab-bot generate |
Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 422. The results will be presented below in the pull request status box. This may take several minutes... |
Hi @jjasghar, could someone please review this PR? I made formatting changes as required. |
Results for job ID: 422 using the model sdg service backend! Results can be found here. |
Hi! 👋 We are thankful for your patience and ask that you please keep this PR open. As soon as we finish all our behind-the-scenes work, we’ll test the full model against your submissions and, ideally, accept your amazing contribution(s)! Your Community Maintainer Team. P.S. if you have any specific questions or thoughts, don’t hesitate to comment on pull request this or email [email protected] and [email protected], and we’ll get back to you as soon as possible. |
@jjasghar Here is a proposed tree location, although it is pretty deep (6 layers under grounded...):
I also considered I think this might be worth a broader discussion from others. |
I don't see this as a technology skill or about AI or machine learning. I see it as more under philosophy/logic or language/linguistics. |
Based on our triage discussion, we recommend this location in the tree:
|
985e57b
to
a8be8ef
Compare
@Mergifyio squash first-commit |
✅ Pull request squashed successfully |
a8be8ef
to
38a052a
Compare
Hot damn, that looked like it worked. |
It squashed but the commit fails the DCO check :-( It used my GitHub noreply email as the commit author and the DCO tool cannot match that email to the
So I am not sure this is a success. We can override the DCO check, but that does not seem like something we should do on a regular basis. |
Ah yep, I just looked and the number of commits, and didn't see anything wrong, overlooked the DCO check. GitHub on the phone makes this so easy to miss things. Ok, so back to the "we need everyone to squash down to one commit, and really enforce it" I guess in the taxonomy docs we can add how to, and if someone really had trouble we'll have to step in if they have passed everything else. Thoughts? |
Signed-off-by: Alexandre Evfimievski <[email protected]> Signed-off-by: BJ Hargrave <[email protected]>
38a052a
to
3bb807d
Compare
Agree. I squashed the commits manually for this PR. |
After a long journey, we are running PR into the cmb now. I'm so sorry on how long this took, ideally by the end of today (US) time we'll have a real eval number and know whether or not to merge this in to the tree. |
This is a re-submission of PR 691.
Describe the contribution to the taxonomy
Given a context document and a user's question, we want the LLM to determine whether it has enough information to provide a yes-or-no answer to the user's question. There are three typical reasons for wrong answers:
This skill falls under the umbrella of "answerability": determination if a user's inquiry has been answered, can be answered, and what extra information is needed to answer it. An extra folder is added in anticipation of more skills under this umbrella. Answerability skills are important for multi-turn grounded response generation, since they help the LLM agent decide what else needs to be asked.
Input given at the prompt
Here is an example for yes-or-no question answerability. Merlinite 7b is provided the following context document:
Then, Merlinite 7b is asked the following question:
The correct answer is Yes, there is enough information to provide a yes-or-no answer to the user's question (namely, the answer to the user's question is "no", they don't need to file Form XYZ.) However, Merlinite 7b provides a wrong answer of two types, as you will see below.
Response from the original model
Wrong answer of type "introspection error":
Wrong answer of type "meta-question error":
When asked the user's question directly, Merlinite 7b provides a correct answer, so it understands the document and the user's question well enough. It is the answerability question that the model struggles with.
Response from the fine-tuned model
(did not have an opportunity to fine tune the model)
Contribution checklist
ilab generate
ilab generate
qna.yaml
file contains at least 5seed_examples
qna.yaml
file was linted and prettified (yaml-validator can do both)attribution.txt
file in the same folder as theqna.yaml
file.