LHL3341 commited on
Commit
f28eef5
Β·
verified Β·
1 Parent(s): b0bbf71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-Coder-7B-Instruct
5
+ tags:
6
+ - code
7
+ ---
8
+ # Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
9
+
10
+ [![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081)
11
+ [![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/)
12
+ [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
13
+
14
+ **Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework.
15
+ It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.
16
+
17
+ ---
18
+
19
+ ## πŸš€ Overview
20
+
21
+ Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**.
22
+ **Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis.
23
+
24
+ | Property | Description |
25
+ | ---------------------- | -------------------------------------------------------------------------- |
26
+ | **Model Type** | Code LLM (Code-Aware Generator) |
27
+ | **Base Model** | Qwen2.5-Coder-7B |
28
+ | **Training Objective** | Next-token prediction on executable reasoning traces |
29
+ | **Training Data** | Code CoTs extracted and unified from math and algorithmic datasets |
30
+ | **Output Type** | Python-like executable reasoning steps (`code_cot`) |
31
+ | **Verification** | Code execution + output consistency filter |
32
+
33
+ ---
34
+
35
+ ## 🧠 Methodology
36
+ <p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p>
37
+
38
+ Caco constructs reasoning data through **three scalable stages**:
39
+
40
+ ### 1. Unifying Code CoT
41
+
42
+ Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format.
43
+
44
+ ### 2. Scaling Code CoT
45
+
46
+ Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** β€” restructuring logic (e.g., decomposition, reformulation, alternative solution paths).
47
+
48
+ ### 3. Instruction Reversing
49
+
50
+ Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**.
51
+
52
+ ---
53
+
54
+
55
+ ## βš™οΈ Usage
56
+
57
+ ### Example Inference
58
+
59
+ ```bash
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM
61
+
62
+ model_name = "LHL3341/Caco-CodeGen"
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+ model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
65
+
66
+ prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
67
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
68
+
69
+ outputs = model.generate(**inputs, max_new_tokens=1024)
70
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
71
+
72
+ ```
73
+
74
+ ### Example use cases
75
+
76
+ * Fine-tuning reasoning LLMs (math, logic, or code tasks)
77
+ * Verifiable reasoning data augmentation
78
+ * Program-based RL reward modeling (RLVR)
79
+ * Cross-domain reasoning transfer experiments
80
+
81
+ ---
82
+
83
+ ## πŸ“ˆ Benchmarks (Caco Models)
84
+
85
+ | Model | MATH | Olympiad | Theorem-QA |
86
+ | -------------------- | -------- | -------- | ---------- |
87
+ | DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 |
88
+ | Qwen2.5-7B-Caco | **82.4** | **46.5** | **46.0** |
89
+ | Llama3-8B-Caco | 70.6 | 34.1 | 31.0 |
90
+
91
+ Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains.
92
+
93
+ ---
94
+
95
+ ## πŸ”¬ Citation
96
+
97
+ If you use **Caco** in your research, please cite:
98
+
99
+ ```bibtex
100
+ @article{caco,
101
+ title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
102
+ author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
103
+ journal={arXiv preprint arXiv:2510.04081},
104
+ year={2025}
105
+ }
106
+ ```
107
+
108
+ ---
109
+
110
+ ## πŸ“œ License
111
+
112
+ Apache 2.0 β€” free for academic and commercial use, with attribution.
113
+
114
+ ---
115
+
116
+ ## 🌱 Related Resources
117
+
118
+ * [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081)
119
+ * [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M)
120
+
121
+ ---
122
+
123
+ ## πŸ’‘ Future Directions
124
+
125
+ * **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO)
126
+ * **Expanding Diversity:** add science, proofs, procedural planning
127
+ * **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal