Text Generation
Transformers
Safetensors
English
qwen2
code
coding-agent
SWE-agent
distillation
agent
conversational
text-generation-inference
Instructions to use cocoa-org/Mocha-Coder-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cocoa-org/Mocha-Coder-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cocoa-org/Mocha-Coder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cocoa-org/Mocha-Coder-32B") model = AutoModelForCausalLM.from_pretrained("cocoa-org/Mocha-Coder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use cocoa-org/Mocha-Coder-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cocoa-org/Mocha-Coder-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cocoa-org/Mocha-Coder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cocoa-org/Mocha-Coder-32B
- SGLang
How to use cocoa-org/Mocha-Coder-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cocoa-org/Mocha-Coder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cocoa-org/Mocha-Coder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cocoa-org/Mocha-Coder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cocoa-org/Mocha-Coder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cocoa-org/Mocha-Coder-32B with Docker Model Runner:
docker model run hf.co/cocoa-org/Mocha-Coder-32B
| base_model: | |
| - Qwen/Qwen2.5-Coder-32B-Instruct | |
| language: | |
| - en | |
| license: mit | |
| pipeline_tag: text-generation | |
| tags: | |
| - code | |
| - coding-agent | |
| - SWE-agent | |
| - distillation | |
| - agent | |
| library_name: transformers | |
| <h1 style=" | |
| font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Helvetica,Arial,sans-serif; | |
| font-size:48px; | |
| font-weight:700; | |
| line-height:1.25; | |
| text-align:center; | |
| margin:0 0 24px;"> | |
| Mocha-Coder-32B | |
| </h1> | |
| <p style="text-align:center; margin:0 0 8px; font-size:16px;"> | |
| <a href="https://junliwang.tech/">Junli Wang</a><sup>*</sup> | |
| <a href="https://blankcheng.github.io/">Zhoujun Cheng</a><sup>*†</sup> | |
| <a href="https://yuxuan-zhang-dexter.github.io/">Yuxuan Zhang</a><sup>*</sup> | |
| <a href="https://ber666.github.io/">Shibo Hao</a> | |
| <a href="https://yaotang23.github.io/">Yao Tang</a> | |
| <br> | |
| <a href="https://zhiting.ucsd.edu/">Zhiting Hu</a> | |
| <a href="https://prithvirajva.com/">Prithviraj Ammanabrolu</a> | |
| <a href="https://haozhang.ai/">Hao Zhang</a><sup>†</sup> | |
| </p> | |
| <p style="text-align:center; margin:0 0 24px; font-size:14px; color:#555;"> | |
| University of California, San Diego · | |
| <sup>*</sup>Equal Contribution · | |
| <sup>†</sup>Corresponding Author | |
| </p> | |
| <div style=" | |
| display:flex; | |
| justify-content:center; | |
| gap:12px; | |
| flex-wrap:wrap; | |
| margin-bottom:28px;"> | |
| <a href="https://github.com/cocoa-org/NanoRollout" style=" | |
| display:inline-block; | |
| padding:8px 24px; | |
| background:#2b2b2b; | |
| color:#ffffff; | |
| border-radius:36px; | |
| text-decoration:none; | |
| font-weight:600; | |
| font-size:16px;"> | |
| 🧑💻 NanoRollout Code | |
| </a> | |
| <a href="https://huggingface.co/ZeonLap/Mocha-Coder-32B" style=" | |
| display:inline-block; | |
| padding:8px 24px; | |
| background:#2b2b2b; | |
| color:#ffffff; | |
| border-radius:36px; | |
| text-decoration:none; | |
| font-weight:600; | |
| font-size:16px;"> | |
| 🤗 Mocha-Coder-32B Model | |
| </a> | |
| <a href="https://cocoa-org.notion.site/nanorollout" style=" | |
| display:inline-block; | |
| padding:8px 24px; | |
| background:#2b2b2b; | |
| color:#ffffff; | |
| border-radius:36px; | |
| text-decoration:none; | |
| font-weight:600; | |
| font-size:16px;"> | |
| 📒 Blog | |
| </a> | |
| </div> | |
| <div style="max-width:900px;margin:0 auto;"> | |
| # Introduction | |
| <div style=" | |
| max-width: 880px; | |
| margin: 0 auto; | |
| text-align: justify; | |
| text-justify: inter-word; | |
| line-height: 1.6;"> | |
| **Mocha-Coder-32B** is a strong open-data coding agent built on top of **Qwen2.5-Coder-32B-Instruct**. It is trained entirely through distillation on a 300K+ trajectory mixture sampled with our lightweight agent-rollout infrastructure, **NanoRollout**, with no reinforcement learning. The full training signal comes from frontier open-source teacher models (Qwen3-Coder-480B-A35B, Kimi-K2.5, Qwen3-Coder-Next, DeepSeek-V3.2) generating trajectories across multiple agent harnesses (OpenHands, mini-swe-agent, Terminus-2 JSON) on SWE-Rebench, SWE-Smith, and SETA. | |
| The result is a simple but strong baseline coding agent: at the ≤32B scale, Mocha-Coder-32B is the state-of-the-art among open-data models and is competitive with much larger open-source models on agentic SWE benchmarks. | |
| </div> | |
| ### Key Features | |
| - **Strong agentic SWE performance**: 62.6 Pass@1 on SWE-Bench Verified, 35.3 on SWE-Bench Pro, 23.6 on Terminal-Bench 2.0, competitive with Qwen3-Coder-480B-A35B-Instruct. | |
| - **Multi-harness training**: Trajectories cover OpenHands, mini-swe-agent, and Terminus-2 JSON, mitigating harness-specific overfitting. | |
| - **Open data**: Distilled from a fully released 300K+ trajectory mixture (`ZeonLap/Mocha-trajectories`). | |
| # Performance | |
| ### SWE-Bench Verified | |
| <div align="center"> | |
| | **Model** | **Max Iteration** | **SWE-Bench Verified (Pass@1)** | | |
| |----------------------------------|:-----------------:|:-------------------------------:| | |
| | Qwen3-Coder-480B-A35B-Instruct | 100 | 67.0 | | |
| | **Mocha-Coder-32B** | 100 | **62.6** | | |
| | SWE-Master-32B-RL | 150 | 61.4 | | |
| | Kimi-Dev-72B | Agentless, TTS@40 | 60.4 | | |
| | CoderForge-Preview-32B | 100 | 59.4 | | |
| | GLM-4.7-Flash | 100 | 59.2 | | |
| | daVinci-Dev-72B | 100 | 58.5 | | |
| | daVinci-Dev-32B | 100 | 56.1 | | |
| | SERA-32B | 100 | 54.2 | | |
| | Qwen3-Coder-30B-A3B-Instruct | 100 | 51.6 | | |
| | Qwen2.5-Coder-32B-Instruct (Base)| 100 | 6.2 | | |
| </div> | |
| ### SWE-Bench Pro | |
| <div align="center"> | |
| | **Model** | **Max Iteration** | **SWE-Bench Pro (Pass@1)** | | |
| |----------------------------------|:-----------------:|:--------------------------:| | |
| | Qwen3-Coder-480B-A35B-Instruct | 250 | 38.7 | | |
| | **Mocha-Coder-32B** | 250 | **35.3** | | |
| | Gemini-3-flash | 250 | 34.6 | | |
| | Kimi-K2-Instruct | 250 | 27.7 | | |
| | DeepSeek-V3.2 | 250 | 15.6 | | |
| | Qwen2.5-Coder-32B-Instruct (Base)| 250 | 0.0 | | |
| </div> | |
| ### Terminal-Bench 2.0 | |
| <div align="center"> | |
| | **Model** | **Terminal-Bench 2.0** | | |
| |----------------------------------|:----------------------:| | |
| | Qwen3-Coder-480B-A35B-Instruct | 23.9 | | |
| | **Mocha-Coder-32B** | **23.6** | | |
| | Qwen3-Coder-30B-A3B-Instruct | 13.5 | | |
| | Qwen2.5-Coder-32B-Instruct (Base)| 3.4 | | |
| </div> | |
| # Training Data | |
| Mocha-Coder-32B is trained on a **300K+ trajectory** distillation mixture, drawn from previously released distillation sets (120K) and trajectories newly generated with NanoRollout (~180K). | |
| | **Dataset** | **Teacher Model** | **Harness** | **# Trajectories (K)** | **Source** | | |
| |-----------------|-----------------------------|-------------------|:----------------------:|-------------------| | |
| | SWE-Rebench | Qwen3-Coder-480B-A35B | OpenHands | 32.2 | Nebius | | |
| | SWE-Smith | Qwen3-Coder-480B-A35B | OpenHands | 89.5 | CoderForge | | |
| | SWE-Rebench | Kimi-K2.5 | mini-swe-agent | 83.6 | NanoRollout | | |
| | SWE-Rebench | Qwen3-Coder-Next | mini-swe-agent | 11.5 | NanoRollout | | |
| | SWE-Smith | Qwen3-Coder-480B-A35B | mini-swe-agent | 12.8 | NanoRollout | | |
| | SWE-Smith | Qwen3-Coder-Next | mini-swe-agent | 9.1 | NanoRollout | | |
| | SETA | Kimi-K2.5 / DeepSeek-V3.2 | Terminus-2 JSON | 14.0 | NanoRollout | | |
| The full mixture is released at [`ZeonLap/Mocha-trajectories`](https://huggingface.co/datasets/ZeonLap/Mocha-trajectories). | |
| # Running as an Agent | |
| Mocha-Coder-32B is trained as an agent and is most useful when paired with a coding-agent harness. We have validated it with: | |
| - **mini-swe-agent** — minimal SWE agent loop, recommended for SWE-Bench Verified / Pro evaluation. | |
| - **OpenHands** — full-featured SWE harness; the model was trained on OpenHands trajectories. | |
| - **Terminus-2 JSON** — for Terminal-Bench 2.0 style shell tasks. | |
| Point each harness's model endpoint at the vLLM server above. For SWE-Bench Verified we report numbers at a 100-iteration budget; for SWE-Bench Pro at 250 iterations. | |
| # License | |
| Mocha-Coder-32B (model weights, training trajectories, and code) is released under the **MIT License** (see `LICENSE`) for research, educational, and commercial use. | |
| # Citation | |
| If you use Mocha-Coder-32B or NanoRollout in your research, please cite NanoRollout: | |
| ```bibtex | |
| @misc{nanorollout, | |
| title = {NanoRollout: A Lightweight Infra for Digital Agent Rollout at Scale}, | |
| author = {Wang, Junli and Cheng, Zhoujun and Zhang, Yuxuan and Hao, Shibo | |
| and Tang, Yao and Hu, Zhiting and Ammanabrolu, Prithviraj | |
| and Zhang, Hao}, | |
| year = {2026}, | |
| howpublished = {\url{https://github.com/cocoa-org/NanoRollout}}, | |
| } | |
| ``` | |
| </div> | |