LHL3341
/

Caco-CodeGen

+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen2.5-Coder-7B-Instruct
+tags:
+- code
+---
+# Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
+[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.04081-B31B1B)](https://arxiv.org/abs/2510.04081)
+[![Conference](https://img.shields.io/badge/NeurIPS-2025-1E90FF)](https://neurips.cc/)
+[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
+**Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework.
+It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale.
+---
+## 🚀 Overview
+Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**.
+**Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis.
+| Property               | Description                                                                |
+| ---------------------- | -------------------------------------------------------------------------- |
+| **Model Type**         | Code LLM (Code-Aware Generator)                                            |
+| **Base Model**         | Qwen2.5-Coder-7B                                                           |
+| **Training Objective** | Next-token prediction on executable reasoning traces                       |
+| **Training Data**      | Code CoTs extracted and unified from math and algorithmic datasets         |
+| **Output Type**        | Python-like executable reasoning steps (`code_cot`)                        |
+| **Verification**       | Code execution + output consistency filter                                 |
+---
+## 🧠 Methodology
+<p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p>
+Caco constructs reasoning data through **three scalable stages**:
+### 1. Unifying Code CoT
+Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format.
+### 2. Scaling Code CoT
+Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** — restructuring logic (e.g., decomposition, reformulation, alternative solution paths).
+### 3. Instruction Reversing
+Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**.
+---
+## ⚙️ Usage
+### Example Inference
+```bash
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "LHL3341/Caco-CodeGen"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
+prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n"
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=1024)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Example use cases
+* Fine-tuning reasoning LLMs (math, logic, or code tasks)
+* Verifiable reasoning data augmentation
+* Program-based RL reward modeling (RLVR)
+* Cross-domain reasoning transfer experiments
+---
+## 📈 Benchmarks (Caco Models)
+| Model                | MATH     | Olympiad | Theorem-QA |
+| -------------------- | -------- | -------- | ---------- |
+| DeepSeekMath-7B-Caco | 68.2     | 29.5     | 33.8       |
+| Qwen2.5-7B-Caco      | **82.4** | **46.5** | **46.0**   |
+| Llama3-8B-Caco       | 70.6     | 34.1     | 31.0       |
+Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains.
+---
+## 🔬 Citation
+If you use **Caco** in your research, please cite:
+```bibtex
+@article{caco,
+  title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning},
+  author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu},
+  journal={arXiv preprint arXiv:2510.04081},
+  year={2025}
+}
+```
+---
+## 📜 License
+Apache 2.0 — free for academic and commercial use, with attribution.
+---
+## 🌱 Related Resources
+* [🧠 Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081)
+* [🧩 Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M)
+---
+## 💡 Future Directions
+* **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO)
+* **Expanding Diversity:** add science, proofs, procedural planning
+* **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal