Errors in huggingface dataset #1

SantiDianaClibrain · 2024-03-08T12:23:39Z

Hi! I would like to point that I believe I found at least 1 error in the dataset. Can this be possible? For the next question, the ground truth answer is "None", while I would say that the question can be answered.

Britany records 18 4-minute TikTok videos each week. She spends 2 hours a week writing amateur songs to sing on TikTok, and 15 minutes six days a week doing her makeup before filming herself for TikTok. How much time does Britany spend on TikTok in a month?

The text was updated successfully, but these errors were encountered:

qtli · 2024-03-08T13:10:10Z

Thank you for your feedback!

In this case, "a month with four weeks" has been removed from the original GSM8K question. We are aware of potential ambiguities or errors in certain questions and are actively working on a new version of GSM8K-Plus, which will be released soon.

We greatly appreciate your assistance in identifying any issues you come across.

Additionally, we will release a concise test set for efficient evaluation.

SantiDianaClibrain · 2024-03-11T09:37:33Z

Okay. Do you have any date in mind? I have a model that can have strong results in your benchmark but I would like to have results asap. Thanks!!!

qtli · 2024-03-12T11:24:44Z

We're delighted to learn that you've built a robust model. Feel free to share the developed model with us, and we can conduct a prompt evaluation using our dataset. The release of the finalized dataset may take some time, with our current aim being late March.

SantiDianaClibrain · 2024-03-20T14:51:01Z

Nice! I strongly encourage you to review all the dataset. Found a lot of errors in calculations. Looking forward to seeing your final version of the benchmark.

SantiDianaClibrain · 2024-03-20T15:00:08Z

Interesting question: Raymond and Samantha are cousins. Raymond was born 60 years before Samantha. Raymond had a son at the age of 230. If Samantha is now 310, how many years ago was Raymond's son born?

ekmb · 2024-07-03T20:03:56Z

@qtli any updates on the new dataset version?

qtli · 2024-07-08T02:51:00Z

Interesting question: Raymond and Samantha are cousins. Raymond was born 60 years before Samantha. Raymond had a son at the age of 230. If Samantha is now 310, how many years ago was Raymond's son born?

Hey SantiDianaClibrain, thanks a lot for your feedback! We've been working tirelessly to fix the unrealistic numbers you brought up and some perturbation failures. We've just released an updated version of GSM-Plus and a smaller subset called testmini. We invite you to try out the latest version of GSM-Plus at Huggingface Datasets and we welcome any feedback you may have.

qtli · 2024-07-08T02:54:03Z

@qtli any updates on the new dataset version?

Hi ekmb, thanks for reaching out! We just released an updated version of GSM-Plus (including a smaller subset, called testmini), available on Huggingface Datasets. It resolves issues with unrealistic numbers and non-compliant question contexts. Check it out and share your thoughts!

dgtm777 · 2024-07-09T13:00:33Z

Hi @qtli! I am really interested in this dataset, but I still find some errors, even in the gsm-plus mini subset. Here are two examples of the "adding operation" category with incorrect answer fields.

index in the dataset - 1220
Question: "A trader buys some bags of wheat from a farmer at a rate of $20 per bag. If it costs $2 to transport each bag from the farm to the warehouse, and after selling all the bags at a rate of $30 each and paid $50 for the storage fee, the trader made a total profit of $400 how many bags did he sell?"
Answer: 55
If we look at the solution, it will state that 450/8 = 55, which is wrong
index in the dataset - 2380
Question: "Ruby is 6 times older than Sam. In 21 years, Ruby will be 3 times as old as Sam. How old is Sam now?"
Answer: 30
I believe the correct answer here should be 14

Do you happen to have an estimation of the number of errors in each category so I can know which gap in accuracy to ignore? Or maybe you plan to release a new version—it would be great to have a clean version.

qtli · 2024-07-10T01:01:53Z

Hi @qtli! I am really interested in this dataset, but I still find some errors, even in the gsm-plus mini subset. Here are two examples of the "adding operation" category with incorrect answer fields.

index in the dataset - 1220
Question: "A trader buys some bags of wheat from a farmer at a rate of $20 per bag. If it costs $2 to transport each bag from the farm to the warehouse, and after selling all the bags at a rate of $30 each and paid $50 for the storage fee, the trader made a total profit of $400 how many bags did he sell?"
Answer: 55
If we look at the solution, it will state that 450/8 = 55, which is wrong

index in the dataset - 2380
Question: "Ruby is 6 times older than Sam. In 21 years, Ruby will be 3 times as old as Sam. How old is Sam now?"
Answer: 30
I believe the correct answer here should be 14

Do you happen to have an estimation of the number of errors in each category so I can know which gap in accuracy to ignore? Or maybe you plan to release a new version—it would be great to have a clean version.

Hi @dgtm777l, thank you very much for your valuable feedback! Please allow us a few days to thoroughly review the question-solution pairs in the minitest set. Once the checking process is complete, I will promptly notify you with the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in huggingface dataset #1

Errors in huggingface dataset #1

SantiDianaClibrain commented Mar 8, 2024

qtli commented Mar 8, 2024

SantiDianaClibrain commented Mar 11, 2024

qtli commented Mar 12, 2024

SantiDianaClibrain commented Mar 20, 2024

SantiDianaClibrain commented Mar 20, 2024

ekmb commented Jul 3, 2024

qtli commented Jul 8, 2024

qtli commented Jul 8, 2024

dgtm777 commented Jul 9, 2024

qtli commented Jul 10, 2024

Errors in huggingface dataset #1

Errors in huggingface dataset #1

Comments

SantiDianaClibrain commented Mar 8, 2024

qtli commented Mar 8, 2024

SantiDianaClibrain commented Mar 11, 2024

qtli commented Mar 12, 2024

SantiDianaClibrain commented Mar 20, 2024

SantiDianaClibrain commented Mar 20, 2024

ekmb commented Jul 3, 2024

qtli commented Jul 8, 2024

qtli commented Jul 8, 2024

dgtm777 commented Jul 9, 2024

qtli commented Jul 10, 2024