--- title: Duel of Nemotron emoji: ⚔️ colorFrom: indigo colorTo: purple sdk: docker python_version: "3.11" app_port: 7860 tags: - thousand-token-wood - nemotron - fine-tuned - custom-ui - tiny-titan - self-play - rl - fighting-game - modal pinned: false --- # Duel of Nemotron ⚔️ — Hybrid Self-Play AI Fighter A cyberpunk fighting game where the AI opponent is powered by a **two-tier hybrid**: - **Nemotron 3 Nano 4B** (fine-tuned, on Modal A10) — the **strategist** - **Tiny Fighter** (~142k params, CPU, in this Space) — the **real-time executor** Nemotron watches the fight and outputs a *mode* (aggressive / defensive / grappling / etc.) every several moves. The Tiny Fighter, conditioned on that mode plus the last few moves, picks the actual next move in < 1ms on CPU. This is a tiny-model-implements-the-fast-loop + fine-tuned-LLM-sets-the-direction pattern: a small CPU policy network for real-time play, a larger fine-tuned model for strategic depth. ## How It Works ``` Browser (React + Three.js) ──fight state──▶ Space backend (HF Space CPU) ▲ │ │ ├──▶ Tiny Fighter (142k, <1ms) │ │ returns move + probs │ │ │ └──▶ Modal Nemotron (A10, cold start) │ every ~10 moves: │ returns strategic weights │ ▲ └──────────────────weights + reasoning────────┘ ``` ### Training Pipeline (on Modal A100-40GB) 1. **SFT Bootstrap** — 12k procedural examples teach Nemotron to output strategic weight JSON given fight state. 2. **Self-Play Rollouts** — 100 fights with the SFT model playing both sides. Win/loss outcomes provide reward signals. 3. **Reward-weighted fine-tuning** — positive-reward completions are reinforced, negative-reward completions suppressed. 3 epochs, A100-40GB. ### The Tiny Fighter - **~142k parameter MLP** with BatchNorm, trained on 20k procedurally generated (state, strategy) → move examples. - Runs on CPU in < 1ms per inference. Real-time safe. - Conditioned on Nemotron's strategic weights, so it *adapts its style* (aggressive vs. defensive vs. grappling) on the fly. - 15-move output vocabulary: jab, cross, hook, kick, uppercut, block, parry, dodge, advance, retreat, grapple, throw, sweep, feint, wait. ## Badges Targeted - ✅ **Tiny Titan** — the 142k param model is genuinely tiny and does real work - ✅ **Well-Tuned** — the Nemotron LoRA adapter is published at [sankalphs/duel-nemotron-strategist](https://huggingface.co/sankalphs/duel-nemotron-strategist) - ✅ **Off-Brand** — custom React + Three.js 3D fighting game (not default Gradio) - ✅ **Field Notes** — see blog post - ✅ **Modal Award** — training and inference both run on Modal - ✅ **Nemotron Quest** — fine-tuned Nemotron 3 Nano 4B for the fight ## Local Dev ```bash # Frontend cd 3d-game && npm install && npm run build # Space backend (CPU) pip install -r requirements.txt python app.py ``` Set `MODEL_SERVER` env var to your Modal inference endpoint to enable Nemotron strategy. Without it, the Space falls back to balanced defaults. ## Links - **Fine-tuned adapter**: https://huggingface.co/sankalphs/duel-nemotron-strategist - **Modal orchestration**: see `modal/app.py` in the repo - **Demo video**: _see social post_ - **Social post**: _see social post_ --- Built for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon) by [@sankalphs](https://huggingface.co/sankalphs). 🍄