hf-tuner
/

bert-mini-squadv2

@@ -27,19 +27,20 @@ This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https:
 It achieves the following results on the evaluation set:
 - Loss: 1.4653
-- Exact Match Accuracy: 60.94%
 ## Evaluation Notes
 #### Issues with Exact Match Evaluation
 Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:
-- Predicted: `isaac bashevis` → Rejected (expected: `isaac bashevis singer`)
-- Predicted: `newtonian equations` → Rejected (expected: `newtonian`)
-- Predicted: `80,000` → Rejected (expected: `80, 000`)
 #### Overall Performance
-- Exact-match accuracy: **>60%**
 - The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
 - Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.

 It achieves the following results on the evaluation set:
 - Loss: 1.4653
+- Exact Match Accuracy: 62.95%
 ## Evaluation Notes
 #### Issues with Exact Match Evaluation
 Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:
+- Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`)
+- Predicted: `feynman diagrams` → Rejected (expected: `feynman`)
+- Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`)
+- Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`)
 #### Overall Performance
+- Exact-match accuracy: **>63%**
 - The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
 - Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.