Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a skill to check if information is enough to answer a yes-or-no question #732

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ae2015
Copy link
Contributor

@ae2015 ae2015 commented Apr 19, 2024

This is a re-submission of PR 691.

Describe the contribution to the taxonomy

Given a context document and a user's question, we want the LLM to determine whether it has enough information to provide a yes-or-no answer to the user's question. There are three typical reasons for wrong answers:

  • The LLM does not understand the document or/and the user's question, in which case asking the user's question directly will also produce a wrong answer;
  • The LLM has enough information to answer the user's question, but cannot check that it has when asked the answerability question, and replies "No" (introspection error);
  • The LLM answers the user's question and ignores the answerability question (meta-question error).

This skill falls under the umbrella of "answerability": determination if a user's inquiry has been answered, can be answered, and what extra information is needed to answer it. An extra folder is added in anticipation of more skills under this umbrella. Answerability skills are important for multi-turn grounded response generation, since they help the LLM agent decide what else needs to be asked.

Input given at the prompt

Here is an example for yes-or-no question answerability. Merlinite 7b is provided the following context document:

All regular employees except those on a leave of absence (LOA) must file Form XYZ.
If you are a regular employee on a LOA, you do not need to file the form. If you
are a supplemental employee (type B1, type B2, or type B3), your employee type
determines which form you must file, as follows:

* If you are a type B1 employee, you must follow the same rules that apply to the
regular employees.

* If you are a type B2 employee, the rules that apply to you are different from
those that apply to the regular employees. See Pub. 31416 to find out which forms
you should file.

* If you are a supplemental employee of neither type B1 nor type B2, you are a
type B3 employee. Different rules apply for each part of the year. For information
on type B3 employees, see Pub. 31416.

Then, Merlinite 7b is asked the following question:

User asks the following question:

"I am a supplemental employee of type B1 on a LOA. Should I file Form XYZ?"

As the responding agent, do you have enough information to provide a yes-or-no answer
to the user's question? Answer Yes or No, then explain your answer.

The correct answer is Yes, there is enough information to provide a yes-or-no answer to the user's question (namely, the answer to the user's question is "no", they don't need to file Form XYZ.) However, Merlinite 7b provides a wrong answer of two types, as you will see below.

Response from the original model

Wrong answer of type "introspection error":

No, I cannot provide a direct yes-or-no answer based solely on the given information. The user is
an employee of type B1 and is currently on a leave of absence (LOA). According to the provided
document, regular employees on LOA are not required to file Form XYZ. However, for supplemental
employees (type B1), the rules are the same as those that apply to regular employees. To determine
whether the user should file Form XYZ, it would be necessary to consult the specific guidelines for
type B1 employees or refer to Pub. 31416 for further clarification.

Wrong answer of type "meta-question error":

No, you should not file Form XYZ as a supplemental employee of type B1 on a leave of absence (LOA).
Regular employees who are not on an LOA must file Form XYZ. However, according to the document,
employees on an LOA do not need to file the form. Since you are both a type B1 employee and on an
LOA, you fall under the category of employees who are exempt from filing Form XYZ.

When asked the user's question directly, Merlinite 7b provides a correct answer, so it understands the document and the user's question well enough. It is the answerability question that the model struggles with.

Response from the fine-tuned model

(did not have an opportunity to fine tune the model)

Contribution checklist

  • The contribution was tested with ilab generate
  • No errors or warnings were produced by ilab generate
  • All commits are signed off (DCO)
  • The qna.yaml file contains at least 5 seed_examples
  • The qna.yaml file was linted and prettified (yaml-validator can do both)
  • An attribution.txt file in the same folder as the qna.yaml file.

@ae2015 ae2015 requested a review from a team as a code owner April 19, 2024 22:10
@github-actions github-actions bot added triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 19, 2024
@vishnoianil vishnoianil added skill (Auto labeled) and removed skill (Auto labeled) labels Apr 19, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@vishnoianil vishnoianil added skill (Auto labeled) and removed skill (Auto labeled) labels Apr 19, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@jjasghar
Copy link
Member

@instructlab-bot precheck

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 189. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 189 using the model merlinite-7b!

Results can be found here.

@ae2015
Copy link
Contributor Author

ae2015 commented May 3, 2024

@mingxzhao Here is the ilab train output I observed for this PR, note the substantial improvement after the training:
ilab_train.answerability.txt

@mingxzhao
Copy link
Member

@instructlab-bot precheck

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 257. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 257 using the model merlinite-7b!

Results can be found here.

@mingxzhao
Copy link
Member

Thank you for the ping I will take a look. For future reference the ilab train is only a quantized version, so I do typically need to run against the full model to confirm, but thank you for running the ilab as a pre check!

@mingxzhao
Copy link
Member

@instructlab-bot precheck

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 259. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 259 using the model merlinite-7b!

Results can be found here.

@mingxzhao
Copy link
Member

@instructlab-bot generate

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 263. The results will be presented below in the pull request status box. This may take several minutes...

Copy link

Results for job ID: 263 using the model sdg service backend!

Results can be found here.

@mingxzhao
Copy link
Member

Everything looks good, marking as approved, thank you for your patience!

@mingxzhao mingxzhao added community-build-ready Triage Team has signed off for synthetic data generation and removed triage-needed (Auto labeled) skill is ready to be triaged labels May 6, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Jun 19, 2024
Copy link

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

  • @instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
  • @instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
  • @instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
  • @instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

@jjasghar
Copy link
Member

@instructlab-bot generate

Copy link

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 422. The results will be presented below in the pull request status box. This may take several minutes...

@ae2015
Copy link
Contributor Author

ae2015 commented Jun 19, 2024

Hi @jjasghar, could someone please review this PR? I made formatting changes as required.

Copy link

Results for job ID: 422 using the model sdg service backend!

Results can be found here.

@jjasghar jjasghar removed the triage-needed (Auto labeled) skill is ready to be triaged label Jun 19, 2024
@jjasghar
Copy link
Member

Hi! 👋
It’s been a while since you’ve seen any movement on this PR. We haven’t forgotten about you!  We’ve run into some logistical issues, hence this delay. We absolutely want your PR, and being marketed as e2e-ready is still the last stop before we get it into the upstream model.

We are thankful for your patience and ask that you please keep this PR open. As soon as we finish all our behind-the-scenes work, we’ll test the full model against your submissions and, ideally, accept your amazing contribution(s)! 

Your Community Maintainer Team.

P.S. if you have any specific questions or thoughts, don’t hesitate to comment on pull request this or email [email protected] and [email protected], and we’ll get back to you as soon as possible.

@mcorbin-ibm
Copy link
Contributor

mcorbin-ibm commented Aug 27, 2024

@jjasghar Here is a proposed tree location, although it is pretty deep (6 layers under grounded...):

compositional_skills/grounded/technology/computer_science/ai/machine_learning/answerability/question_answering/yes_no

I also considered /ai/nlp/question_answering/yes_no based on this Wikipedia entry:
https://en.wikipedia.org/wiki/Question_answering
But the submitter specifically put it in the "LLM" camp, which is aligned to machine learning. And, might want to work "extraction" in here?

I think this might be worth a broader discussion from others.

@bjhargrave
Copy link
Contributor

technology/computer_science/ai/machine_learning

I don't see this as a technology skill or about AI or machine learning. I see it as more under philosophy/logic or language/linguistics.

@mcorbin-ibm
Copy link
Contributor

Based on our triage discussion, we recommend this location in the tree:

compositional_skills/grounded/linguistics/info_extraction/question_answering/yes_no

@github-actions github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Aug 29, 2024
@bjhargrave
Copy link
Contributor

@Mergifyio squash first-commit

Copy link

mergify bot commented Aug 29, 2024

squash first-commit

✅ Pull request squashed successfully

@jjasghar
Copy link
Member

Hot damn, that looked like it worked.

@bjhargrave
Copy link
Contributor

Hot damn, that looked like it worked.

It squashed but the commit fails the DCO check :-( It used my GitHub noreply email as the commit author and the DCO tool cannot match that email to the Signed-off-bys in the commit message.

Commit sha: 38a052a, Author: bjhargrave, Committer: bjhargrave; Can not find "bjhargrave [email protected]", in ["Alexandre Evfimievski [email protected]", "BJ Hargrave [email protected]"].

So I am not sure this is a success. We can override the DCO check, but that does not seem like something we should do on a regular basis.

@jjasghar
Copy link
Member

Ah yep, I just looked and the number of commits, and didn't see anything wrong, overlooked the DCO check. GitHub on the phone makes this so easy to miss things.

Ok, so back to the "we need everyone to squash down to one commit, and really enforce it" I guess in the taxonomy docs we can add how to, and if someone really had trouble we'll have to step in if they have passed everything else.

Thoughts?

Signed-off-by: Alexandre Evfimievski <[email protected]>
Signed-off-by: BJ Hargrave <[email protected]>
@bjhargrave
Copy link
Contributor

Ok, so back to the "we need everyone to squash down to one commit, and really enforce it" I guess in the taxonomy docs we can add how to, and if someone really had trouble we'll have to step in if they have passed everything else.

Thoughts?

Agree. I squashed the commits manually for this PR.

@bjhargrave bjhargrave removed the triage-needed (Auto labeled) skill is ready to be triaged label Aug 30, 2024
@jjasghar
Copy link
Member

After a long journey, we are running PR into the cmb now. I'm so sorry on how long this took, ideally by the end of today (US) time we'll have a real eval number and know whether or not to merge this in to the tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-build-ready Triage Team has signed off for synthetic data generation skill (Auto labeled)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants