dongwookkwon commited on
Commit
f633415
·
verified ·
1 Parent(s): aa59b3f

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -46,10 +46,13 @@ Both datasets were converted to a unified `messages` format compatible with Qwen
46
 
47
  | Metric | Method | Few-shot | Score | Std Error |
48
  |--------|--------|----------|-------|-----------|
49
- | exact_match | flexible-extract | 5 | TBD | ±TBD |
50
- | exact_match | strict-match | 5 | TBD | ±TBD |
51
 
52
  - **Baseline** (Qwen2.5-0.5B-Instruct): 34.42% (flexible-extract), 31.69% (strict-match)
 
 
 
53
  - **Note**: This model was fine-tuned on a curated dataset mixture of 47,473 samples to improve mathematical reasoning capabilities
54
 
55
  ### Evaluation Details
 
46
 
47
  | Metric | Method | Few-shot | Score | Std Error |
48
  |--------|--------|----------|-------|-----------|
49
+ | exact_match | flexible-extract | 5 | **34.12%** | ±1.31% |
50
+ | exact_match | strict-match | 5 | **33.59%** | ±1.30% |
51
 
52
  - **Baseline** (Qwen2.5-0.5B-Instruct): 34.42% (flexible-extract), 31.69% (strict-match)
53
+ - **Improvement**:
54
+ - Flexible-extract: Comparable performance (34.12% vs 34.42%)
55
+ - Strict-match: **+1.90% improvement** (33.59% vs 31.69%)
56
  - **Note**: This model was fine-tuned on a curated dataset mixture of 47,473 samples to improve mathematical reasoning capabilities
57
 
58
  ### Evaluation Details