MasterVito commited on
Commit
f87bf40
·
verified ·
1 Parent(s): f55c06e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - RLVR-SvS/Variational-DAPO
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - Qwen/Qwen2.5-7B-Instruct
11
+ pipeline_tag: reinforcement-learning
12
+ ---
13
+
14
+ # Model Card for SvS-Code-7B (from Qwen2.5-7B-Instruct)
15
+
16
+ <p align="left">
17
+ <a href="https://mastervito.github.io/SvS.github.io/"><b>[🌐 Website]</b></a> •
18
+ <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[🤗 Dataset]</b></a> •
19
+ <a href="https://huggingface.co/datasets/RLVR-SvS/SvS-Qwen-32B"><b>[🤖 Models]</b></a> •
20
+ <a href="https://arxiv.org/abs/2508.14029"><b>[📜 Paper]</b></a> •
21
+ <a href="https://github.com/MasterVito/SvS"><b>[🐱 GitHub]</b></a> •
22
+ <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[🐦 Twitter]</b></a> •
23
+ <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[📕 Rednote]</b></a>
24
+ </p>
25
+
26
+ The official model checkpoints for <a href="https://arxiv.org/abs/2508.14029"><b>SvS</b></a>. The SvS model is trained on a subset of coding tasks from PRIME-RL dataset (included in this repository as <code>12k_code_rl.parquet</code>).
27
+
28
+ # Inference
29
+ We recommend using our official inference template (The inherent chat template from Qwen2.5 Instruct models).
30
+
31
+ ```python
32
+ model_name = "RLVR-SvS/SvS-Qwen-Code-7B"
33
+ device = "cuda" # the device to load the model onto
34
+
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ model_name,
37
+ torch_dtype="auto",
38
+ device_map="auto"
39
+ )
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+
42
+ prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
43
+
44
+ messages = [
45
+ {"role": "user", "content": prompt}
46
+ ]
47
+
48
+ text = tokenizer.apply_chat_template(
49
+ messages,
50
+ tokenize=False,
51
+ add_generation_prompt=True
52
+ )
53
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
54
+
55
+ generated_ids = model.generate(
56
+ **model_inputs,
57
+ max_new_tokens=8192
58
+ )
59
+ generated_ids = [
60
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
61
+ ]
62
+
63
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
64
+ ```
65
+
66
+ # Cite Us
67
+ If you find the model helpful, please consider citing our paper:
68
+
69
+ ```
70
+ @misc{liang2025pass1selfplayvariationalproblem,
71
+ title={Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR},
72
+ author={Xiao Liang and Zhongzhi Li and Yeyun Gong and Yelong Shen and Ying Nian Wu and Zhijiang Guo and Weizhu Chen},
73
+ year={2025},
74
+ eprint={2508.14029},
75
+ archivePrefix={arXiv},
76
+ primaryClass={cs.CL},
77
+ url={https://arxiv.org/abs/2508.14029},
78
+ }
79
+ ```