21223wj commited on
Commit
249d4b1
·
verified ·
1 Parent(s): a9db627

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +67 -1
  3. results.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ results.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - IIGroup/X-Coder-SFT-Qwen3-8B
5
+ datasets:
6
+ - IIGroup/X-Coder-RL-40k
7
+ language:
8
+ - en
9
+ tags:
10
+ - code
11
+ - rl
12
+ - competitive-programming
13
  ---
14
+
15
+ # X-Coder-RL-Qwen3-8B
16
+
17
+ X-Coder-RL-Qwen3-8B is a code generation model trained with RLVR (Reinforcement Learning with Verifiable Rewards) on fully synthetic data, achieving state-of-the-art performance on competitive programming benchmarks.
18
+
19
+ ## Model Description
20
+
21
+ - **Base Model**: [IIGroup/X-Coder-SFT-Qwen3-8B](https://huggingface.co/IIGroup/X-Coder-SFT-Qwen3-8B)
22
+ - **Training Method**: RLVR (Reinforcement Learning with Verifiable Rewards)
23
+ - **Training Data**: [IIGroup/X-Coder-RL-40k](https://huggingface.co/datasets/IIGroup/X-Coder-RL-40k) (40k fully synthetic tasks)
24
+ - **Parameters**: 8B
25
+
26
+ ## Performance
27
+
28
+ **LiveCodeBench Average Performance: 64.0**
29
+
30
+ ![Results](results.png)
31
+
32
+ **Performance on LiveCodeBench v5.** X-Coder shows strong coding expertise with fewer, fully synthetic tasks, and achieves additional gains through subsequent RL stages.
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ model_name = "IIGroup/X-Coder-RL-Qwen3-8B"
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
42
+
43
+ prompt = "Write a Python function to solve the two sum problem."
44
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
45
+ outputs = model.generate(**inputs, max_new_tokens=512)
46
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
47
+ ```
48
+
49
+ ## Training
50
+
51
+ This model was trained using the X-Coder RLVR framework. For training details and code, please refer to the [X-Coder GitHub repository](https://github.com/JieWu02/X-Coder).
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @inproceedings{
57
+ anonymous2025xcoder,
58
+ title={X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests},
59
+ author={Anonymous},
60
+ booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
61
+ year={2025},
62
+ url={https://openreview.net/forum?id=jp4dzBilqH},
63
+ note={under review}
64
+ }
65
+ ```
66
+
67
+ ## License
68
+
69
+ This project is licensed under the Apache License 2.0.
results.png ADDED

Git LFS Details

  • SHA256: 1a65e6a94d2e445ded612e3b2f401e9cbd64766893462f545dcb4baefd18c60c
  • Pointer size: 131 Bytes
  • Size of remote file: 445 kB