| # Nexus-Coder-Alpha |
|
|
| A practical training guide and recipe for building state-of-the-art **agentic coding assistants** with open-source 8B parameter models. |
|
|
| ## What This Is |
|
|
| This repository consolidates research from **Nemotron-Terminal**, **Klear-AgentForge**, **GLM-5**, and **Qwen3-Coder-Next** into a single reproducible training pipeline: |
|
|
| 1. **Supervised Fine-Tuning (SFT)** on high-quality multi-turn agent trajectories |
| 2. **Reinforcement Learning (RL)** with execution-verified rewards |
| 3. **Deployment** in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool |
|
|
| ## Target Model |
|
|
| **Base:** [`nvidia/Nemotron-Terminal-8B`](https://hf.co/nvidia/Nemotron-Terminal-8B) |
| - 8.2B parameters, Qwen3 architecture, native `tool_calls` support |
| - Already pre-trained for terminal/code-agent interaction |
| - Fits on single A100 or A10g-large with LoRA |
|
|
| ## Key Results (from cited papers) |
|
|
| | Benchmark | 8B Target | SOTA Reference | |
| |---|---|---| |
| | SWE-bench Verified | 20-40% | Klear-AgentForge: **39.4%** | |
| | BFCL v3 | 65-75% | Klear-AgentForge: **71.5%** | |
| | Terminal-Bench 2.0 | 15-25% | Nemotron-T-14B: **20.2%** | |
| | Aider-Polyglot | 25-40% | Klear-AgentForge: **33.8%** | |
|
|
| ## Documents |
|
|
| - **[TRAINING_GUIDE.md](TRAINING_GUIDE.md)** β Full SFT β RL β Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks |
| - **[train_sft.py](train_sft.py)** β Reference training script for Stage 1 (SFT) |
| - **[train_grpo.py](train_grpo.py)** β Reference training script for Stage 2 (GRPO RL) |
|
|
| ## Quick Start |
|
|
| ```bash |
| # Stage 1: SFT on curated agent trajectories |
| python train_sft.py \ |
| --model nvidia/Nemotron-Terminal-8B \ |
| --dataset mixed_agentic_dataset \ |
| --output_dir ./nexus-coder-sft |
| |
| # Stage 2: GRPO with execution-verified rewards |
| python train_grpo.py \ |
| --model ./nexus-coder-sft \ |
| --dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \ |
| --output_dir ./nexus-coder-rl |
| ``` |
|
|
| ## Core Datasets |
|
|
| | Dataset | Split | Purpose | Link | |
| |---|---|---|---| |
| | SWE-bench/SWE-smith-trajectories | `tool` (resolved=True) | SFT: Real repo bug fixing | [HF](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories) | |
| | nvidia/Nemotron-Agentic-v1 | `interactive_agent` + `tool_calling` | SFT: Multi-turn tool use | [HF](https://hf.co/datasets/nvidia/Nemotron-Agentic-v1) | |
| | xingyaoww/code-act | `codeact` + `general` | SFT: Executable code actions | [HF](https://hf.co/datasets/xingyaoww/code-act) | |
| | nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 | `train` | RL: Step-level pass-rate rewards | [HF](https://hf.co/datasets/nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1) | |
|
|
| ## Top SOTA Tricks |
|
|
| 1. **Multi-format tool templates** β Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework. |
| 2. **Token-in-Token-Out (TITO)** β Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation. |
| 3. **Async RL** β Decouple vLLM inference engine from training loop for 2-3x throughput. |
| 4. **Format-aware regularization** β Penalize malformed tool calls even if the action is logically correct. |
| 5. **60/30/10 data mix** β SWE trajectories / general tool-use / code-as-action by token volume. |
|
|
| ## Benchmarks |
|
|
| - **SWE-bench Verified** β Primary real-world software engineering benchmark |
| - **Terminal-Bench 2.0** β Terminal/agent task completion |
| - **BFCL v3** β Multi-turn function calling |
| - **Aider-Polyglot** β Multi-language code editing |
| - **tau-bench** β Long-horizon multi-turn tool use |
|
|
| ## Citation |
|
|
| If you use this recipe, please cite the underlying research: |
|
|
| ```bibtex |
| @article{nemotron-terminal-2026, |
| title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models}, |
| author={NVIDIA}, |
| journal={arXiv:2602.21193}, |
| year={2026} |
| } |
| @article{klear-agentforge-2025, |
| title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling}, |
| author={Klear-AI}, |
| journal={arXiv:2511.05951}, |
| year={2025} |
| } |
| @article{glm5-2026, |
| title={GLM-5: from Vibe Coding to Agentic Engineering}, |
| author={Zhipu AI}, |
| journal={arXiv:2602.15763}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners. |
|
|