Stripping whitespaces as default for QATask ICL eval (#3073)

* stripping whitespaces as default for QATask ICL eval if normalization is off (like in gsm8k), leading/trailing whitespaces can cause mismatch which leads to low accuracy even when the model gets the answer right. Fixing that here * adding strip to labels as well * changing list comprehension to set comprehension * using a real set comprehension
mosaicml · Mar 1, 2024 · 6ce38ad · 6ce38ad
1 parent cee4523
commit 6ce38ad
Showing 1 changed file with 3 additions and 2 deletions.
diff --git a/composer/metrics/nlp.py b/composer/metrics/nlp.py
@@ -316,8 +316,9 @@ def update(self, outputs: List[str], labels: List[List[str]], batch: Dict[str, A
                 cleaned_final_answer = self.normalize_answer(final_answer)
                 cleaned_sample_labels = {self.normalize_answer(label) for label in sample_labels}
             else:
-                cleaned_final_answer = final_answer
-                cleaned_sample_labels = set(sample_labels)
+                # even if normalization is off, we should still strip leading/trailing whitespaces
+                cleaned_final_answer = final_answer.strip()
+                cleaned_sample_labels = {sample_label.strip() for sample_label in sample_labels}
 
             if any(cleaned_final_answer.startswith(label) for label in cleaned_sample_labels):
                 self.correct += torch.tensor(1.0)