--- license: mit datasets: - neulab/SP3F-Training-Data base_model: - Qwen/Qwen2.5-7B pipeline_tag: text-generation language: - ar - bn - de - en - es - fr - hi - id - it - ja - ko - pt - ru - sw - te - th - yo - zh --- # SP3F-7B SP3F-7B is a multilingual model trained with Self-Play with Privileged Pairwise Feedback, we use Qwen2.5-7B as our base.
| Model | Overall | MGSM | MT Math100 | Belebele | Global MMLU Lite | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Lang | Acc | Lang | Acc | Lang | Acc | Lang | Acc | Lang | |
| Qwen2.5-7B | 14.79 | 78.78 | 22.15 | 90.67 | 21.16 | 58.22 | 7.52 | 80.39 | 8.34 | 85.85 |
| + SFT | 21.70 | 82.11 | 33.66 | 91.37 | 26.72 | 58.26 | 12.94 | 89.18 | 13.48 | 89.62 |
| + RLVR | 57.79 | 96.09 | 65.34 | 99.75 | 44.50 | 86.10 | 68.18 | 98.73 | 53.15 | 99.78 |
| SP3F-7B | 61.91 | 95.35 | 72.50 | 99.38 | 56.84 | 82.93 | 67.54 | 99.65 | 50.76 | 99.45 |
| Qwen2.5-7B-Instruct | 55.87 | 89.21 | 66.36 | 98.38 | 52.12 | 65.66 | 56.79 | 96.59 | 48.20 | 96.21 |
| + Translate Test | 57.01 | 85.98 | 66.15 | 95.81 | 60.08 | 59.34 | 48.09 | 92.27 | 53.73 | 96.49 |