Instructions to use cocoa-org/Mocha-Coder-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cocoa-org/Mocha-Coder-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cocoa-org/Mocha-Coder-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cocoa-org/Mocha-Coder-32B")
model = AutoModelForCausalLM.from_pretrained("cocoa-org/Mocha-Coder-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cocoa-org/Mocha-Coder-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cocoa-org/Mocha-Coder-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cocoa-org/Mocha-Coder-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cocoa-org/Mocha-Coder-32B

SGLang

How to use cocoa-org/Mocha-Coder-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cocoa-org/Mocha-Coder-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cocoa-org/Mocha-Coder-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cocoa-org/Mocha-Coder-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cocoa-org/Mocha-Coder-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cocoa-org/Mocha-Coder-32B with Docker Model Runner:
```
docker model run hf.co/cocoa-org/Mocha-Coder-32B
```

Mocha-Coder-32B / README.md

ZeonLap

Update citation

bad5cb1 verified 11 days ago

preview code

raw

history blame contribute delete

8.7 kB

	---
	base_model:
	- Qwen/Qwen2.5-Coder-32B-Instruct
	language:
	- en
	license: mit
	pipeline_tag: text-generation
	tags:
	- code
	- coding-agent
	- SWE-agent
	- distillation
	- agent
	library_name: transformers
	---

	<h1 style="
	font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Helvetica,Arial,sans-serif;
	font-size:48px;
	font-weight:700;
	line-height:1.25;
	text-align:center;
	margin:0 0 24px;">
	Mocha-Coder-32B
	</h1>

	<p style="text-align:center; margin:0 0 8px; font-size:16px;">
	<a href="https://junliwang.tech/">Junli Wang</a><sup>*</sup>
	<a href="https://blankcheng.github.io/">Zhoujun Cheng</a><sup>*†</sup>
	<a href="https://yuxuan-zhang-dexter.github.io/">Yuxuan Zhang</a><sup>*</sup>
	<a href="https://ber666.github.io/">Shibo Hao</a>
	<a href="https://yaotang23.github.io/">Yao Tang</a>
	<br>
	<a href="https://zhiting.ucsd.edu/">Zhiting Hu</a>
	<a href="https://prithvirajva.com/">Prithviraj Ammanabrolu</a>
	<a href="https://haozhang.ai/">Hao Zhang</a><sup>†</sup>
	</p>

	<p style="text-align:center; margin:0 0 24px; font-size:14px; color:#555;">
	University of California, San Diego  ·
	<sup>*</sup>Equal Contribution  ·
	<sup>†</sup>Corresponding Author
	</p>

	<div style="
	display:flex;
	justify-content:center;
	gap:12px;
	flex-wrap:wrap;
	margin-bottom:28px;">

	<a href="https://github.com/cocoa-org/NanoRollout" style="
	display:inline-block;
	padding:8px 24px;
	background:#2b2b2b;
	color:#ffffff;
	border-radius:36px;
	text-decoration:none;
	font-weight:600;
	font-size:16px;">
	🧑‍💻 NanoRollout Code
	</a>

	<a href="https://huggingface.co/ZeonLap/Mocha-Coder-32B" style="
	display:inline-block;
	padding:8px 24px;
	background:#2b2b2b;
	color:#ffffff;
	border-radius:36px;
	text-decoration:none;
	font-weight:600;
	font-size:16px;">
	🤗 Mocha-Coder-32B Model
	</a>

	<a href="https://cocoa-org.notion.site/nanorollout" style="
	display:inline-block;
	padding:8px 24px;
	background:#2b2b2b;
	color:#ffffff;
	border-radius:36px;
	text-decoration:none;
	font-weight:600;
	font-size:16px;">
	📒 Blog
	</a>
	</div>

	<div style="max-width:900px;margin:0 auto;">

	# Introduction
	<div style="
	max-width: 880px;
	margin: 0 auto;
	text-align: justify;
	text-justify: inter-word;
	line-height: 1.6;">

	Mocha-Coder-32B is a strong open-data coding agent built on top of Qwen2.5-Coder-32B-Instruct. It is trained entirely through distillation on a 300K+ trajectory mixture sampled with our lightweight agent-rollout infrastructure, NanoRollout, with no reinforcement learning. The full training signal comes from frontier open-source teacher models (Qwen3-Coder-480B-A35B, Kimi-K2.5, Qwen3-Coder-Next, DeepSeek-V3.2) generating trajectories across multiple agent harnesses (OpenHands, mini-swe-agent, Terminus-2 JSON) on SWE-Rebench, SWE-Smith, and SETA.

	The result is a simple but strong baseline coding agent: at the ≤32B scale, Mocha-Coder-32B is the state-of-the-art among open-data models and is competitive with much larger open-source models on agentic SWE benchmarks.
	</div>

	### Key Features

	- Strong agentic SWE performance: 62.6 Pass@1 on SWE-Bench Verified, 35.3 on SWE-Bench Pro, 23.6 on Terminal-Bench 2.0, competitive with Qwen3-Coder-480B-A35B-Instruct.
	- Multi-harness training: Trajectories cover OpenHands, mini-swe-agent, and Terminus-2 JSON, mitigating harness-specific overfitting.
	- Open data: Distilled from a fully released 300K+ trajectory mixture (`ZeonLap/Mocha-trajectories`).

	# Performance

	### SWE-Bench Verified
	<div align="center">

	\| Model \| Max Iteration \| SWE-Bench Verified (Pass@1) \|
	\|----------------------------------\|:-----------------:\|:-------------------------------:\|
	\| Qwen3-Coder-480B-A35B-Instruct \| 100 \| 67.0 \|
	\| Mocha-Coder-32B \| 100 \| 62.6 \|
	\| SWE-Master-32B-RL \| 150 \| 61.4 \|
	\| Kimi-Dev-72B \| Agentless, TTS@40 \| 60.4 \|
	\| CoderForge-Preview-32B \| 100 \| 59.4 \|
	\| GLM-4.7-Flash \| 100 \| 59.2 \|
	\| daVinci-Dev-72B \| 100 \| 58.5 \|
	\| daVinci-Dev-32B \| 100 \| 56.1 \|
	\| SERA-32B \| 100 \| 54.2 \|
	\| Qwen3-Coder-30B-A3B-Instruct \| 100 \| 51.6 \|
	\| Qwen2.5-Coder-32B-Instruct (Base)\| 100 \| 6.2 \|
	</div>

	### SWE-Bench Pro
	<div align="center">

	\| Model \| Max Iteration \| SWE-Bench Pro (Pass@1) \|
	\|----------------------------------\|:-----------------:\|:--------------------------:\|
	\| Qwen3-Coder-480B-A35B-Instruct \| 250 \| 38.7 \|
	\| Mocha-Coder-32B \| 250 \| 35.3 \|
	\| Gemini-3-flash \| 250 \| 34.6 \|
	\| Kimi-K2-Instruct \| 250 \| 27.7 \|
	\| DeepSeek-V3.2 \| 250 \| 15.6 \|
	\| Qwen2.5-Coder-32B-Instruct (Base)\| 250 \| 0.0 \|
	</div>

	### Terminal-Bench 2.0
	<div align="center">

	\| Model \| Terminal-Bench 2.0 \|
	\|----------------------------------\|:----------------------:\|
	\| Qwen3-Coder-480B-A35B-Instruct \| 23.9 \|
	\| Mocha-Coder-32B \| 23.6 \|
	\| Qwen3-Coder-30B-A3B-Instruct \| 13.5 \|
	\| Qwen2.5-Coder-32B-Instruct (Base)\| 3.4 \|
	</div>

	# Training Data

	Mocha-Coder-32B is trained on a 300K+ trajectory distillation mixture, drawn from previously released distillation sets (120K) and trajectories newly generated with NanoRollout (~180K).

	\| Dataset \| Teacher Model \| Harness \| # Trajectories (K) \| Source \|
	\|-----------------\|-----------------------------\|-------------------\|:----------------------:\|-------------------\|
	\| SWE-Rebench \| Qwen3-Coder-480B-A35B \| OpenHands \| 32.2 \| Nebius \|
	\| SWE-Smith \| Qwen3-Coder-480B-A35B \| OpenHands \| 89.5 \| CoderForge \|
	\| SWE-Rebench \| Kimi-K2.5 \| mini-swe-agent \| 83.6 \| NanoRollout \|
	\| SWE-Rebench \| Qwen3-Coder-Next \| mini-swe-agent \| 11.5 \| NanoRollout \|
	\| SWE-Smith \| Qwen3-Coder-480B-A35B \| mini-swe-agent \| 12.8 \| NanoRollout \|
	\| SWE-Smith \| Qwen3-Coder-Next \| mini-swe-agent \| 9.1 \| NanoRollout \|
	\| SETA \| Kimi-K2.5 / DeepSeek-V3.2 \| Terminus-2 JSON \| 14.0 \| NanoRollout \|

	The full mixture is released at [`ZeonLap/Mocha-trajectories`](https://huggingface.co/datasets/ZeonLap/Mocha-trajectories).

	# Running as an Agent

	Mocha-Coder-32B is trained as an agent and is most useful when paired with a coding-agent harness. We have validated it with:

	- mini-swe-agent — minimal SWE agent loop, recommended for SWE-Bench Verified / Pro evaluation.
	- OpenHands — full-featured SWE harness; the model was trained on OpenHands trajectories.
	- Terminus-2 JSON — for Terminal-Bench 2.0 style shell tasks.

	Point each harness's model endpoint at the vLLM server above. For SWE-Bench Verified we report numbers at a 100-iteration budget; for SWE-Bench Pro at 250 iterations.

	# License

	Mocha-Coder-32B (model weights, training trajectories, and code) is released under the MIT License (see `LICENSE`) for research, educational, and commercial use.

	# Citation

	If you use Mocha-Coder-32B or NanoRollout in your research, please cite NanoRollout:

	```bibtex
	@misc{nanorollout,
	title = {NanoRollout: A Lightweight Infra for Digital Agent Rollout at Scale},
	author = {Wang, Junli and Cheng, Zhoujun and Zhang, Yuxuan and Hao, Shibo
	and Tang, Yao and Hu, Zhiting and Ammanabrolu, Prithviraj
	and Zhang, Hao},
	year = {2026},
	howpublished = {\url{https://github.com/cocoa-org/NanoRollout}},
	}
	```

	</div>