Text Generation
Safetensors
qwen2
conversational

SP3F-7B

SP3F-7B is a multilingual model trained with Self-Play with Privileged Pairwise Feedback, we use Qwen2.5-7B as our base.

Model Overall MGSM MT Math100 Belebele Global MMLU Lite
Acc Lang Acc Lang Acc Lang Acc Lang Acc Lang
Qwen2.5-7B 14.79 78.78 22.15 90.67 21.16 58.22 7.52 80.39 8.34 85.85
    + SFT 21.70 82.11 33.66 91.37 26.72 58.26 12.94 89.18 13.48 89.62
        + RLVR 57.79 96.09 65.34 99.75 44.50 86.10 68.18 98.73 53.15 99.78
SP3F-7B 61.91 95.35 72.50 99.38 56.84 82.93 67.54 99.65 50.76 99.45
Qwen2.5-7B-Instruct 55.87 89.21 66.36 98.38 52.12 65.66 56.79 96.59 48.20 96.21
    + Translate Test 57.01 85.98 66.15 95.81 60.08 59.34 48.09 92.27 53.73 96.49

Citation

If you find this work helpful please use the following to cite our work.


Downloads last month
18
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for neulab/SP3F-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(823)
this model
Quantizations
2 models

Dataset used to train neulab/SP3F-7B

Collection including neulab/SP3F-7B