SP3F-7B / README.md

lintang

Update README.md

6267ccb verified 3 days ago

preview code

raw

history blame contribute delete

2.74 kB

metadata

license: mit
datasets:
  - neulab/SP3F-Training-Data
base_model:
  - Qwen/Qwen2.5-7B
pipeline_tag: text-generation
language:
  - ar
  - bn
  - de
  - en
  - es
  - fr
  - hi
  - id
  - it
  - ja
  - ko
  - pt
  - ru
  - sw
  - te
  - th
  - yo
  - zh

SP3F-7B

SP3F-7B is a multilingual model trained with Self-Play with Privileged Pairwise Feedback, we use Qwen2.5-7B as our base.

Model	Overall		MGSM		MT Math100		Belebele		Global MMLU Lite
Model	Acc	Lang	Acc	Lang	Acc	Lang	Acc	Lang	Acc	Lang
Qwen2.5-7B	14.79	78.78	22.15	90.67	21.16	58.22	7.52	80.39	8.34	85.85
+ SFT	21.70	82.11	33.66	91.37	26.72	58.26	12.94	89.18	13.48	89.62
+ RLVR	57.79	96.09	65.34	99.75	44.50	86.10	68.18	98.73	53.15	99.78
SP3F-7B	61.91	95.35	72.50	99.38	56.84	82.93	67.54	99.65	50.76	99.45
Qwen2.5-7B-Instruct	55.87	89.21	66.36	98.38	52.12	65.66	56.79	96.59	48.20	96.21
+ Translate Test	57.01	85.98	66.15	95.81	60.08	59.34	48.09	92.27	53.73	96.49

Citation

If you find this work helpful please use the following to cite our work.