Upload folder using huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -46,10 +46,13 @@ Both datasets were converted to a unified `messages` format compatible with Qwen
 | Metric | Method | Few-shot | Score | Std Error |
 |--------|--------|----------|-------|-----------|
-| exact_match | flexible-extract | 5 | TBD | ±TBD |
-| exact_match | strict-match | 5 | TBD | ±TBD |
 - **Baseline** (Qwen2.5-0.5B-Instruct): 34.42% (flexible-extract), 31.69% (strict-match)
 - **Note**: This model was fine-tuned on a curated dataset mixture of 47,473 samples to improve mathematical reasoning capabilities
 ### Evaluation Details

 | Metric | Method | Few-shot | Score | Std Error |
 |--------|--------|----------|-------|-----------|
+| exact_match | flexible-extract | 5 | **34.12%** | ±1.31% |
+| exact_match | strict-match | 5 | **33.59%** | ±1.30% |
 - **Baseline** (Qwen2.5-0.5B-Instruct): 34.42% (flexible-extract), 31.69% (strict-match)
+- **Improvement**:
+  - Flexible-extract: Comparable performance (34.12% vs 34.42%)
+  - Strict-match: **+1.90% improvement** (33.59% vs 31.69%)
 - **Note**: This model was fine-tuned on a curated dataset mixture of 47,473 samples to improve mathematical reasoning capabilities
 ### Evaluation Details