Text Generation
Safetensors
qwen2
conversational
SP3F-7B / README.md
lintang's picture
Update README.md
6267ccb verified
metadata
license: mit
datasets:
  - neulab/SP3F-Training-Data
base_model:
  - Qwen/Qwen2.5-7B
pipeline_tag: text-generation
language:
  - ar
  - bn
  - de
  - en
  - es
  - fr
  - hi
  - id
  - it
  - ja
  - ko
  - pt
  - ru
  - sw
  - te
  - th
  - yo
  - zh

SP3F-7B

SP3F-7B is a multilingual model trained with Self-Play with Privileged Pairwise Feedback, we use Qwen2.5-7B as our base.

Model Overall MGSM MT Math100 Belebele Global MMLU Lite
Acc Lang Acc Lang Acc Lang Acc Lang Acc Lang
Qwen2.5-7B 14.79 78.78 22.15 90.67 21.16 58.22 7.52 80.39 8.34 85.85
    + SFT 21.70 82.11 33.66 91.37 26.72 58.26 12.94 89.18 13.48 89.62
        + RLVR 57.79 96.09 65.34 99.75 44.50 86.10 68.18 98.73 53.15 99.78
SP3F-7B 61.91 95.35 72.50 99.38 56.84 82.93 67.54 99.65 50.76 99.45
Qwen2.5-7B-Instruct 55.87 89.21 66.36 98.38 52.12 65.66 56.79 96.59 48.20 96.21
    + Translate Test 57.01 85.98 66.15 95.81 60.08 59.34 48.09 92.27 53.73 96.49

Citation

If you find this work helpful please use the following to cite our work.