Add a skill to check if information is enough to answer a yes-or-no question #732

ae2015 · 2024-04-19T22:10:49Z

This is a re-submission of PR 691.

Describe the contribution to the taxonomy

Given a context document and a user's question, we want the LLM to determine whether it has enough information to provide a yes-or-no answer to the user's question. There are three typical reasons for wrong answers:

The LLM does not understand the document or/and the user's question, in which case asking the user's question directly will also produce a wrong answer;
The LLM has enough information to answer the user's question, but cannot check that it has when asked the answerability question, and replies "No" (introspection error);
The LLM answers the user's question and ignores the answerability question (meta-question error).

This skill falls under the umbrella of "answerability": determination if a user's inquiry has been answered, can be answered, and what extra information is needed to answer it. An extra folder is added in anticipation of more skills under this umbrella. Answerability skills are important for multi-turn grounded response generation, since they help the LLM agent decide what else needs to be asked.

Input given at the prompt

Here is an example for yes-or-no question answerability. Merlinite 7b is provided the following context document:

All regular employees except those on a leave of absence (LOA) must file Form XYZ.
If you are a regular employee on a LOA, you do not need to file the form. If you
are a supplemental employee (type B1, type B2, or type B3), your employee type
determines which form you must file, as follows:

* If you are a type B1 employee, you must follow the same rules that apply to the
regular employees.

* If you are a type B2 employee, the rules that apply to you are different from
those that apply to the regular employees. See Pub. 31416 to find out which forms
you should file.

* If you are a supplemental employee of neither type B1 nor type B2, you are a
type B3 employee. Different rules apply for each part of the year. For information
on type B3 employees, see Pub. 31416.

Then, Merlinite 7b is asked the following question:

User asks the following question:

"I am a supplemental employee of type B1 on a LOA. Should I file Form XYZ?"

As the responding agent, do you have enough information to provide a yes-or-no answer
to the user's question? Answer Yes or No, then explain your answer.

The correct answer is Yes, there is enough information to provide a yes-or-no answer to the user's question (namely, the answer to the user's question is "no", they don't need to file Form XYZ.) However, Merlinite 7b provides a wrong answer of two types, as you will see below.

Response from the original model

Wrong answer of type "introspection error":

No, I cannot provide a direct yes-or-no answer based solely on the given information. The user is
an employee of type B1 and is currently on a leave of absence (LOA). According to the provided
document, regular employees on LOA are not required to file Form XYZ. However, for supplemental
employees (type B1), the rules are the same as those that apply to regular employees. To determine
whether the user should file Form XYZ, it would be necessary to consult the specific guidelines for
type B1 employees or refer to Pub. 31416 for further clarification.

Wrong answer of type "meta-question error":

No, you should not file Form XYZ as a supplemental employee of type B1 on a leave of absence (LOA).
Regular employees who are not on an LOA must file Form XYZ. However, according to the document,
employees on an LOA do not need to file the form. Since you are both a type B1 employee and on an
LOA, you fall under the category of employees who are exempt from filing Form XYZ.

When asked the user's question directly, Merlinite 7b provides a correct answer, so it understands the document and the user's question well enough. It is the answerability question that the model struggles with.

Response from the fine-tuned model

(did not have an opportunity to fine tune the model)

Contribution checklist

The contribution was tested with ilab generate
No errors or warnings were produced by ilab generate
All commits are signed off (DCO)
The qna.yaml file contains at least 5 seed_examples
The qna.yaml file was linted and prettified (yaml-validator can do both)
An attribution.txt file in the same folder as the qna.yaml file.

instruct-lab-bot · 2024-04-19T22:35:22Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

instruct-lab-bot · 2024-04-19T23:01:57Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

jjasghar · 2024-04-22T17:36:58Z

@instructlab-bot precheck

instruct-lab-bot · 2024-04-22T17:37:01Z

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 189. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-04-22T18:49:43Z

Results for job ID: 189 using the model merlinite-7b!

Results can be found here.

ae2015 · 2024-05-03T18:23:20Z

@mingxzhao Here is the ilab train output I observed for this PR, note the substantial improvement after the training:
ilab_train.answerability.txt

mingxzhao · 2024-05-03T18:24:21Z

@instructlab-bot precheck

instruct-lab-bot · 2024-05-03T18:24:23Z

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 257. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-05-03T18:25:03Z

Results for job ID: 257 using the model merlinite-7b!

Results can be found here.

mingxzhao · 2024-05-03T18:25:38Z

Thank you for the ping I will take a look. For future reference the ilab train is only a quantized version, so I do typically need to run against the full model to confirm, but thank you for running the ilab as a pre check!

mingxzhao · 2024-05-06T16:11:10Z

@instructlab-bot precheck

instruct-lab-bot · 2024-05-06T16:11:12Z

Beep, boop 🤖, Generating test data for your PR with the job type: precheck. Your Job ID is 259. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-05-06T16:11:46Z

Results for job ID: 259 using the model merlinite-7b!

Results can be found here.

mingxzhao · 2024-05-06T17:32:52Z

@instructlab-bot generate

instruct-lab-bot · 2024-05-06T17:32:55Z

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 263. The results will be presented below in the pull request status box. This may take several minutes...

instruct-lab-bot · 2024-05-06T17:36:20Z

Results for job ID: 263 using the model sdg service backend!

Results can be found here.

mingxzhao · 2024-05-06T17:53:56Z

Everything looks good, marking as approved, thank you for your patience!

instruct-lab-bot · 2024-05-06T17:54:10Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

instruct-lab-bot · 2024-06-19T16:52:41Z

Beep, boop 🤖, Hi, I'm @instructlab-bot and I'm going to help you with your pull request. Thanks for you contribution! 🎉

I support the following commands:

@instructlab-bot precheck -- Check existing model behavior using the questions in this proposed change.
@instructlab-bot generate -- Generate a sample of synthetic data using the synthetic data generation backend infrastructure.
@instructlab-bot generate-local -- Generate a sample of synthetic data using a local model.
@instructlab-bot help -- Print this help message again.

Note

Results or Errors of these commands will be posted as a pull request check in the Checks section below

Note

Currently only maintainers belongs to [[taxonomy-triagers taxonomy-approvers taxonomy-maintainers labrador-org-maintainers instruct-lab-bot-maintainers]] teams are allowed to run these commands.

jjasghar · 2024-06-19T16:59:26Z

@instructlab-bot generate

instruct-lab-bot · 2024-06-19T16:59:29Z

Beep, boop 🤖, Generating test data for your PR with the job type: sdg-svc. Your Job ID is 422. The results will be presented below in the pull request status box. This may take several minutes...

ae2015 · 2024-06-19T17:02:34Z

Hi @jjasghar, could someone please review this PR? I made formatting changes as required.

instruct-lab-bot · 2024-06-19T17:02:48Z

Results for job ID: 422 using the model sdg service backend!

Results can be found here.

jjasghar · 2024-07-10T17:28:22Z

Hi! 👋
It’s been a while since you’ve seen any movement on this PR. We haven’t forgotten about you! We’ve run into some logistical issues, hence this delay. We absolutely want your PR, and being marketed as e2e-ready is still the last stop before we get it into the upstream model.

We are thankful for your patience and ask that you please keep this PR open. As soon as we finish all our behind-the-scenes work, we’ll test the full model against your submissions and, ideally, accept your amazing contribution(s)!

Your Community Maintainer Team.

P.S. if you have any specific questions or thoughts, don’t hesitate to comment on pull request this or email [email protected] and [email protected], and we’ll get back to you as soon as possible.

mcorbin-ibm · 2024-08-27T20:21:22Z

@jjasghar Here is a proposed tree location, although it is pretty deep (6 layers under grounded...):

compositional_skills/grounded/technology/computer_science/ai/machine_learning/answerability/question_answering/yes_no

I also considered /ai/nlp/question_answering/yes_no based on this Wikipedia entry:
https://en.wikipedia.org/wiki/Question_answering
But the submitter specifically put it in the "LLM" camp, which is aligned to machine learning. And, might want to work "extraction" in here?

I think this might be worth a broader discussion from others.

bjhargrave · 2024-08-28T19:43:47Z

technology/computer_science/ai/machine_learning

I don't see this as a technology skill or about AI or machine learning. I see it as more under philosophy/logic or language/linguistics.

mcorbin-ibm · 2024-08-29T20:24:56Z

Based on our triage discussion, we recommend this location in the tree:

compositional_skills/grounded/linguistics/info_extraction/question_answering/yes_no

bjhargrave · 2024-08-29T21:09:40Z

@Mergifyio squash first-commit

mergify · 2024-08-29T21:09:52Z

squash first-commit

✅ Pull request squashed successfully

jjasghar · 2024-08-29T23:06:37Z

Hot damn, that looked like it worked.

bjhargrave · 2024-08-30T12:59:24Z

Hot damn, that looked like it worked.

It squashed but the commit fails the DCO check :-( It used my GitHub noreply email as the commit author and the DCO tool cannot match that email to the Signed-off-bys in the commit message.

Commit sha: 38a052a, Author: bjhargrave, Committer: bjhargrave; Can not find "bjhargrave [email protected]", in ["Alexandre Evfimievski [email protected]", "BJ Hargrave [email protected]"].

So I am not sure this is a success. We can override the DCO check, but that does not seem like something we should do on a regular basis.

jjasghar · 2024-08-30T14:40:25Z

Ah yep, I just looked and the number of commits, and didn't see anything wrong, overlooked the DCO check. GitHub on the phone makes this so easy to miss things.

Ok, so back to the "we need everyone to squash down to one commit, and really enforce it" I guess in the taxonomy docs we can add how to, and if someone really had trouble we'll have to step in if they have passed everything else.

Thoughts?

Signed-off-by: Alexandre Evfimievski <[email protected]> Signed-off-by: BJ Hargrave <[email protected]>

bjhargrave · 2024-08-30T14:52:25Z

Ok, so back to the "we need everyone to squash down to one commit, and really enforce it" I guess in the taxonomy docs we can add how to, and if someone really had trouble we'll have to step in if they have passed everything else.

Thoughts?

Agree. I squashed the commits manually for this PR.

jjasghar · 2024-12-18T16:24:26Z

After a long journey, we are running PR into the cmb now. I'm so sorry on how long this took, ideally by the end of today (US) time we'll have a real eval number and know whether or not to merge this in to the tree.

ae2015 requested a review from a team as a code owner April 19, 2024 22:10

github-actions bot added triage-needed (Auto labeled) skill is ready to be triaged skill (Auto labeled) labels Apr 19, 2024

vishnoianil added skill (Auto labeled) and removed skill (Auto labeled) labels Apr 19, 2024

RobotSail force-pushed the main branch from 825fde4 to 9ae7bfa Compare April 23, 2024 17:40

bjhargrave force-pushed the answerability branch from cc3c751 to 3de729b Compare April 23, 2024 21:55

instructlab deleted a comment from RobotSail Apr 24, 2024

mingxzhao added community-build-ready Triage Team has signed off for synthetic data generation and removed triage-needed (Auto labeled) skill is ready to be triaged labels May 6, 2024

github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Jun 19, 2024

jjasghar removed the triage-needed (Auto labeled) skill is ready to be triaged label Jun 19, 2024

jjasghar approved these changes Jun 19, 2024

View reviewed changes

mcorbin-ibm mentioned this pull request Aug 27, 2024

Add a skill for extracting evidence from a document for a grounded claim #727

Closed

6 tasks

github-actions bot added the triage-needed (Auto labeled) skill is ready to be triaged label Aug 29, 2024

bjhargrave force-pushed the answerability branch from 985e57b to a8be8ef Compare August 29, 2024 21:08

bjhargrave force-pushed the answerability branch from a8be8ef to 38a052a Compare August 29, 2024 21:09

Add a skill to check if a yes-or-no question is answerable

3bb807d

Signed-off-by: Alexandre Evfimievski <[email protected]> Signed-off-by: BJ Hargrave <[email protected]>

bjhargrave force-pushed the answerability branch from 38a052a to 3bb807d Compare August 30, 2024 14:51

bjhargrave removed the triage-needed (Auto labeled) skill is ready to be triaged label Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a skill to check if information is enough to answer a yes-or-no question #732

Add a skill to check if information is enough to answer a yes-or-no question #732

ae2015 commented Apr 19, 2024

instruct-lab-bot bot commented Apr 19, 2024

instruct-lab-bot bot commented Apr 19, 2024

jjasghar commented Apr 22, 2024

instruct-lab-bot bot commented Apr 22, 2024

instruct-lab-bot bot commented Apr 22, 2024

ae2015 commented May 3, 2024

mingxzhao commented May 3, 2024

instruct-lab-bot bot commented May 3, 2024

instruct-lab-bot bot commented May 3, 2024

mingxzhao commented May 3, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented Jun 19, 2024

jjasghar commented Jun 19, 2024

instruct-lab-bot bot commented Jun 19, 2024

ae2015 commented Jun 19, 2024

instruct-lab-bot bot commented Jun 19, 2024

jjasghar commented Jul 10, 2024

mcorbin-ibm commented Aug 27, 2024 •

edited

Loading

bjhargrave commented Aug 28, 2024

mcorbin-ibm commented Aug 29, 2024

bjhargrave commented Aug 29, 2024

mergify bot commented Aug 29, 2024

jjasghar commented Aug 29, 2024

bjhargrave commented Aug 30, 2024

jjasghar commented Aug 30, 2024

bjhargrave commented Aug 30, 2024

jjasghar commented Dec 18, 2024

Add a skill to check if information is enough to answer a yes-or-no question #732

Are you sure you want to change the base?

Add a skill to check if information is enough to answer a yes-or-no question #732

Conversation

ae2015 commented Apr 19, 2024

instruct-lab-bot bot commented Apr 19, 2024

instruct-lab-bot bot commented Apr 19, 2024

jjasghar commented Apr 22, 2024

instruct-lab-bot bot commented Apr 22, 2024

instruct-lab-bot bot commented Apr 22, 2024

ae2015 commented May 3, 2024

mingxzhao commented May 3, 2024

instruct-lab-bot bot commented May 3, 2024

instruct-lab-bot bot commented May 3, 2024

mingxzhao commented May 3, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

mingxzhao commented May 6, 2024

instruct-lab-bot bot commented May 6, 2024

instruct-lab-bot bot commented Jun 19, 2024

jjasghar commented Jun 19, 2024

instruct-lab-bot bot commented Jun 19, 2024

ae2015 commented Jun 19, 2024

instruct-lab-bot bot commented Jun 19, 2024

jjasghar commented Jul 10, 2024

mcorbin-ibm commented Aug 27, 2024 • edited Loading

bjhargrave commented Aug 28, 2024

mcorbin-ibm commented Aug 29, 2024

bjhargrave commented Aug 29, 2024

mergify bot commented Aug 29, 2024

✅ Pull request squashed successfully

jjasghar commented Aug 29, 2024

bjhargrave commented Aug 30, 2024

jjasghar commented Aug 30, 2024

bjhargrave commented Aug 30, 2024

jjasghar commented Dec 18, 2024

mcorbin-ibm commented Aug 27, 2024 •

edited

Loading