RLVR-SvS
/

SvS-Qwen-Code-7B

Reinforcement Learning

Model card Files Files and versions

MasterVito commited on Dec 11, 2025

Commit

f87bf40

·

verified ·

1 Parent(s): f55c06e

Create README.md

Files changed (1) hide show

README.md +79 -0

README.md ADDED Viewed

	@@ -0,0 +1,79 @@

+---
+license: mit
+datasets:
+- RLVR-SvS/Variational-DAPO
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+pipeline_tag: reinforcement-learning
+---
+# Model Card for SvS-Code-7B (from Qwen2.5-7B-Instruct)
+<p align="left">
+  <a href="https://mastervito.github.io/SvS.github.io/"><b>[🌐 Website]</b></a> •
+  <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[🤗 Dataset]</b></a> •
+  <a href="https://huggingface.co/datasets/RLVR-SvS/SvS-Qwen-32B"><b>[🤖 Models]</b></a> •
+  <a href="https://arxiv.org/abs/2508.14029"><b>[📜 Paper]</b></a> •
+  <a href="https://github.com/MasterVito/SvS"><b>[🐱 GitHub]</b></a> •
+  <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[🐦 Twitter]</b></a> •
+  <a href="https://huggingface.co/datasets/RLVR-SvS/Variational-DAPO"><b>[📕 Rednote]</b></a>
+</p>
+The official model checkpoints for <a href="https://arxiv.org/abs/2508.14029"><b>SvS</b></a>. The SvS model is trained on a subset of coding tasks from PRIME-RL dataset (included in this repository as <code>12k_code_rl.parquet</code>).
+# Inference
+We recommend using our official inference template (The inherent chat template from Qwen2.5 Instruct models).
+```python
+model_name = "RLVR-SvS/SvS-Qwen-Code-7B"
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=8192
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+# Cite Us
+If you find the model helpful, please consider citing our paper:
+```
+@misc{liang2025pass1selfplayvariationalproblem,
+      title={Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR},
+      author={Xiao Liang and Zhongzhi Li and Yeyun Gong and Yelong Shen and Ying Nian Wu and Zhijiang Guo and Weizhu Chen},
+      year={2025},
+      eprint={2508.14029},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2508.14029},
+}
+```