Add README with overview, quick start, and dataset links

44fda62 verified 18 days ago

4.27 kB

	# Nexus-Coder-Alpha

	A practical training guide and recipe for building state-of-the-art agentic coding assistants with open-source 8B parameter models.

	## What This Is

	This repository consolidates research from Nemotron-Terminal, Klear-AgentForge, GLM-5, and Qwen3-Coder-Next into a single reproducible training pipeline:

	1. Supervised Fine-Tuning (SFT) on high-quality multi-turn agent trajectories
	2. Reinforcement Learning (RL) with execution-verified rewards
	3. Deployment in Pi agent, Cline, OpenCode, or any OpenAI-compatible coding tool

	## Target Model

	Base: [`nvidia/Nemotron-Terminal-8B`](https://hf.co/nvidia/Nemotron-Terminal-8B)
	- 8.2B parameters, Qwen3 architecture, native `tool_calls` support
	- Already pre-trained for terminal/code-agent interaction
	- Fits on single A100 or A10g-large with LoRA

	## Key Results (from cited papers)

	\| Benchmark \| 8B Target \| SOTA Reference \|
	\|---\|---\|---\|
	\| SWE-bench Verified \| 20-40% \| Klear-AgentForge: 39.4% \|
	\| BFCL v3 \| 65-75% \| Klear-AgentForge: 71.5% \|
	\| Terminal-Bench 2.0 \| 15-25% \| Nemotron-T-14B: 20.2% \|
	\| Aider-Polyglot \| 25-40% \| Klear-AgentForge: 33.8% \|

	## Documents

	- [TRAINING_GUIDE.md](TRAINING_GUIDE.md) — Full SFT → RL → Deployment recipe with code snippets, dataset links, hyperparameters, and SOTA tricks
	- [train_sft.py](train_sft.py) — Reference training script for Stage 1 (SFT)
	- [train_grpo.py](train_grpo.py) — Reference training script for Stage 2 (GRPO RL)

	## Quick Start

	```bash
	# Stage 1: SFT on curated agent trajectories
	python train_sft.py \
	--model nvidia/Nemotron-Terminal-8B \
	--dataset mixed_agentic_dataset \
	--output_dir ./nexus-coder-sft

	# Stage 2: GRPO with execution-verified rewards
	python train_grpo.py \
	--model ./nexus-coder-sft \
	--dataset nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \
	--output_dir ./nexus-coder-rl
	```

	## Core Datasets

	\| Dataset \| Split \| Purpose \| Link \|
	\|---\|---\|---\|---\|
	\| SWE-bench/SWE-smith-trajectories \| `tool` (resolved=True) \| SFT: Real repo bug fixing \| [HF](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories) \|
	\| nvidia/Nemotron-Agentic-v1 \| `interactive_agent` + `tool_calling` \| SFT: Multi-turn tool use \| [HF](https://hf.co/datasets/nvidia/Nemotron-Agentic-v1) \|
	\| xingyaoww/code-act \| `codeact` + `general` \| SFT: Executable code actions \| [HF](https://hf.co/datasets/xingyaoww/code-act) \|
	\| nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1 \| `train` \| RL: Step-level pass-rate rewards \| [HF](https://hf.co/datasets/nvidia/Nemotron-RL-Agentic-SWE-Pivot-v1) \|

	## Top SOTA Tricks

	1. Multi-format tool templates — Train on 4-5 schemas (OpenAI JSON, XML, Python-style, TypeScript, Qwen3-native) so the model generalizes to any agent framework.
	2. Token-in-Token-Out (TITO) — Use raw token IDs from vLLM rollouts; never re-tokenize for RL loss computation.
	3. Async RL — Decouple vLLM inference engine from training loop for 2-3x throughput.
	4. Format-aware regularization — Penalize malformed tool calls even if the action is logically correct.
	5. 60/30/10 data mix — SWE trajectories / general tool-use / code-as-action by token volume.

	## Benchmarks

	- SWE-bench Verified — Primary real-world software engineering benchmark
	- Terminal-Bench 2.0 — Terminal/agent task completion
	- BFCL v3 — Multi-turn function calling
	- Aider-Polyglot — Multi-language code editing
	- tau-bench — Long-horizon multi-turn tool use

	## Citation

	If you use this recipe, please cite the underlying research:

	```bibtex
	@article{nemotron-terminal-2026,
	title={Nemotron-Terminal: Scalable Training for Terminal-Capable Language Models},
	author={NVIDIA},
	journal={arXiv:2602.21193},
	year={2026}
	}
	@article{klear-agentforge-2025,
	title={Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling},
	author={Klear-AI},
	journal={arXiv:2511.05951},
	year={2025}
	}
	@article{glm5-2026,
	title={GLM-5: from Vibe Coding to Agentic Engineering},
	author={Zhipu AI},
	journal={arXiv:2602.15763},
	year={2026}
	}
	```

	## License

	The training guide and scripts are provided as-is for research and educational purposes. Dataset and base model licenses apply to their respective owners.