File size: 4,271 Bytes
44fda62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# Nexus-Coder-Alpha

A practical training guide and recipe for building state-of-the-art **agentic coding assistants** with open-source 8B parameter models.

## What This Is

This repository consolidates research from **Nemotron-Terminal**, **Klear-AgentForge**, **GLM-5**, and **Qwen3-Coder-Next** into a single reproducible training pipeline:

1. **Supervised Fine-Tuning (SFT)** on high-quality multi-turn agent trajectories
2. **Reinforcement Learning (RL)** with execution-verified rewards
3. **Deployment** in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool

## Target Model

**Base:** [`nvidia/Nemotron-Terminal-8B`](https://hf.co/nvidia/Nemotron-Terminal-8B)
- 8.2B parameters, Qwen3 architecture, native `tool_calls` support
- Already pre-trained for terminal/code-agent interaction
- Fits on single A100 or A10g-large with LoRA

## Key Results (from cited papers)

| Benchmark | 8B Target | SOTA Reference |
|---|---|---|
| SWE-bench Verified | 20-40% | Klear-AgentForge: **39.4%** |
| BFCL v3 | 65-75% | Klear-AgentForge: **71.5%** |
| Terminal-Bench 2.0 | 15-25% | Nemotron-T-14B: **20.2%** |
| Aider-Polyglot | 25-40% | Klear-AgentForge: **33.8%** |

## Documents

- **[TRAINING_GUIDE.md](TRAINING_GUIDE.md)** β€” Full SFT β†’ RL β†’ Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
- **[train_sft.py](train_sft.py)** β€” Reference training script for Stage 1 (SFT)
- **[train_grpo.py](train_grpo.py)** β€” Reference training script for Stage 2 (GRPO RL)

## Quick Start

```bash
# Stage 1: SFT on curated agent trajectories
python train_sft.py \
  --model nvidia/Nemotron-Terminal-8B \
  --dataset mixed_agentic_dataset \
  --output_dir ./nexus-coder-sft

# Stage 2: GRPO with execution-verified rewards
python train_grpo.py \
  --model ./nexus-coder-sft \
  --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
  --output_dir ./nexus-coder-rl
```

## Core Datasets

| Dataset | Split | Purpose | Link |
|---|---|---|---|
| SWE-bench/SWE-smith-trajectories | `tool` (resolved=True) | SFT: Real repo bug fixing | [HF](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories) |
| nvidia/Nemotron-Agentic-v1 | `interactive_agent` + `tool_calling` | SFT: Multi-turn tool use | [HF](https://hf.co/datasets/nvidia/Nemotron-Agentic-v1) |
| xingyaoww/code-act | `codeact` + `general` | SFT: Executable code actions | [HF](https://hf.co/datasets/xingyaoww/code-act) |
| nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 | `train` | RL: Step-level pass-rate rewards | [HF](https://hf.co/datasets/nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1) |

## Top SOTA Tricks

1. **Multi-format tool templates** β€” Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
2. **Token-in-Token-Out (TITO)** β€” Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
3. **Async RL** β€” Decouple vLLM inference engine from training loop for 2-3x throughput.
4. **Format-aware regularization** β€” Penalize malformed tool calls even if the action is logically correct.
5. **60/30/10 data mix** β€” SWE trajectories / general tool-use / code-as-action by token volume.

## Benchmarks

- **SWE-bench Verified** β€” Primary real-world software engineering benchmark
- **Terminal-Bench 2.0** β€” Terminal/agent task completion
- **BFCL v3** β€” Multi-turn function calling
- **Aider-Polyglot** β€” Multi-language code editing
- **tau-bench** β€” Long-horizon multi-turn tool use

## Citation

If you use this recipe, please cite the underlying research:

```bibtex
@article{nemotron-terminal-2026,
  title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
  author={NVIDIA},
  journal={arXiv:2602.21193},
  year={2026}
}
@article{klear-agentforge-2025,
  title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
  author={Klear-AI},
  journal={arXiv:2511.05951},
  year={2025}
}
@article{glm5-2026,
  title={GLM-5: from Vibe Coding to Agentic Engineering},
  author={Zhipu AI},
  journal={arXiv:2602.15763},
  year={2026}
}
```

## License

The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.