|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-Coder-7B |
|
|
tags: |
|
|
- code |
|
|
--- |
|
|
# Caco: Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning |
|
|
|
|
|
[](https://arxiv.org/abs/2510.04081) |
|
|
[](https://neurips.cc/) |
|
|
[](https://opensource.org/licenses/Apache-2.0) |
|
|
|
|
|
**Caco-CodeGen** is a code-driven reasoning generation model trained under the Caco framework. |
|
|
It serves as the core engine for expanding executable Code Chain-of-Thoughts (Code CoTs), enabling diverse, verifiable, and pattern-aware reasoning data synthesis at scale. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Overview |
|
|
|
|
|
Traditional Chain-of-Thought (CoT) data often lacks **verifiability** and **diversity**. |
|
|
**Caco** addresses this by grounding reasoning in *executable programs*, enabling automatic correctness checks and scalable reasoning synthesis. |
|
|
|
|
|
| Property | Description | |
|
|
| ---------------------- | -------------------------------------------------------------------------- | |
|
|
| **Model Type** | Code LLM (Code-Aware Generator) | |
|
|
| **Base Model** | Qwen2.5-Coder-7B | |
|
|
| **Training Objective** | Next-token prediction on executable reasoning traces | |
|
|
| **Training Data** | Code CoTs extracted and unified from math and algorithmic datasets | |
|
|
| **Output Type** | Python-like executable reasoning steps (`code_cot`) | |
|
|
| **Verification** | Code execution + output consistency filter | |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Methodology |
|
|
<p align="center"> <img src="https://github.com/LHL3341/Caco/blob/main/caco.png?raw=true" alt="Caco Framework Overview" width="600"/> </p> |
|
|
|
|
|
Caco constructs reasoning data through **three scalable stages**: |
|
|
|
|
|
### 1. Unifying Code CoT |
|
|
|
|
|
Collect diverse **seed reasoning traces** (mathematical + algorithmic), normalize them into a unified executable format. |
|
|
|
|
|
### 2. Scaling Code CoT |
|
|
|
|
|
Train a **Code Generator** to expand reasoning traces via **Pattern-level Augmentation** β restructuring logic (e.g., decomposition, reformulation, alternative solution paths). |
|
|
|
|
|
### 3. Instruction Reversing |
|
|
|
|
|
Back-translate executable reasoning into **natural language problems and solutions**, and apply **dual correctness verification**. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## βοΈ Usage |
|
|
|
|
|
### Example Inference |
|
|
|
|
|
```bash |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
model_name = "LHL3341/Caco-CodeGen" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda") |
|
|
|
|
|
prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=1024) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
``` |
|
|
|
|
|
### Example use cases |
|
|
|
|
|
* Fine-tuning reasoning LLMs (math, logic, or code tasks) |
|
|
* Verifiable reasoning data augmentation |
|
|
* Program-based RL reward modeling (RLVR) |
|
|
* Cross-domain reasoning transfer experiments |
|
|
|
|
|
--- |
|
|
|
|
|
## π Benchmarks (Caco Models) |
|
|
|
|
|
| Model | MATH | Olympiad | Theorem-QA | |
|
|
| -------------------- | -------- | -------- | ---------- | |
|
|
| DeepSeekMath-7B-Caco | 68.2 | 29.5 | 33.8 | |
|
|
| Qwen2.5-7B-Caco | **82.4** | **46.5** | **46.0** | |
|
|
| Llama3-8B-Caco | 70.6 | 34.1 | 31.0 | |
|
|
|
|
|
Models trained on Caco show **consistent improvements** across multiple reasoning benchmarks and domains. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Citation |
|
|
|
|
|
If you use **Caco** in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{caco, |
|
|
title={Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning}, |
|
|
author={Honglin Lin and Qizhi Pei and Xin Gao and Zhuoshi Pan and Yu Li and Juntao Li and Conghui He and Lijun Wu}, |
|
|
journal={arXiv preprint arXiv:2510.04081}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
Apache 2.0 β free for academic and commercial use, with attribution. |
|
|
|
|
|
--- |
|
|
|
|
|
## π± Related Resources |
|
|
|
|
|
* [π§ Caco Paper (arXiv:2510.04081)](https://arxiv.org/abs/2510.04081) |
|
|
* [π§© Caco-1.3M Dataset](https://huggingface.co/datasets/LHL3341/Caco-1.3M) |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘ Future Directions |
|
|
|
|
|
* **Raising Difficulty:** integrate harder datasets (AM-Thinking-distill, DAPO) |
|
|
* **Expanding Diversity:** add science, proofs, procedural planning |
|
|
* **RL with Verifiable Rewards (RLVR):** use code execution as low-noise reward signal |