hf-tuner commited on
Commit
5a67324
·
verified ·
1 Parent(s): 508c54b

update accuracy: normalized answer comparison result

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -27,19 +27,20 @@ This model is a fine-tuned version of [microsoft/MiniLM-L12-H384-uncased](https:
27
 
28
  It achieves the following results on the evaluation set:
29
  - Loss: 1.4653
30
- - Exact Match Accuracy: 60.94%
31
 
32
  ## Evaluation Notes
33
 
34
  #### Issues with Exact Match Evaluation
35
  Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:
36
 
37
- - Predicted: `isaac bashevis` → Rejected (expected: `isaac bashevis singer`)
38
- - Predicted: `newtonian equations` → Rejected (expected: `newtonian`)
39
- - Predicted: `80,000` → Rejected (expected: `80, 000`)
 
40
 
41
  #### Overall Performance
42
- - Exact-match accuracy: **>60%**
43
  - The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
44
  - Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.
45
 
 
27
 
28
  It achieves the following results on the evaluation set:
29
  - Loss: 1.4653
30
+ - Exact Match Accuracy: 62.95%
31
 
32
  ## Evaluation Notes
33
 
34
  #### Issues with Exact Match Evaluation
35
  Several correct predictions were incorrectly marked as false negatives due to strict exact-match criteria being sensitive to minor differences in tokenization, formatting, or span boundaries:
36
 
37
+ - Predicted: `schrodinger equation` → Rejected (expected: `schrödinger equation`)
38
+ - Predicted: `feynman diagrams` → Rejected (expected: `feynman`)
39
+ - Predicted: `electromagnetic force` → Rejected (expected: `electromagnetic`)
40
+ - Predicted: `45 000 pounds` → Rejected (expected: `45000 pounds`)
41
 
42
  #### Overall Performance
43
+ - Exact-match accuracy: **>63%**
44
  - The model frequently generates high-quality and semantically correct answer spans even when exact-match evaluation penalizes them.
45
  - Primary limitation: performance drops on questions requiring deep domain-specific knowledge, largely attributable to the model's relatively small size and limited parameter capacity.
46