Spaces:
Running
Running
metadata
title: Duel of Nemotron
emoji: ⚔️
colorFrom: indigo
colorTo: purple
sdk: docker
python_version: '3.11'
app_port: 7860
tags:
- thousand-token-wood
- nemotron
- fine-tuned
- custom-ui
- tiny-titan
- self-play
- rl
- fighting-game
- modal
pinned: false
Duel of Nemotron ⚔️ — Hybrid Self-Play AI Fighter
A cyberpunk fighting game where the AI opponent is powered by a two-tier hybrid:
- Nemotron 3 Nano 4B (fine-tuned, on Modal A10) — the strategist
- Tiny Fighter (~142k params, CPU, in this Space) — the real-time executor
Nemotron watches the fight and outputs a mode (aggressive / defensive / grappling / etc.) every several moves. The Tiny Fighter, conditioned on that mode plus the last few moves, picks the actual next move in < 1ms on CPU.
This is a tiny-model-implements-the-fast-loop + fine-tuned-LLM-sets-the-direction pattern: a small CPU policy network for real-time play, a larger fine-tuned model for strategic depth.
How It Works
Browser (React + Three.js) ──fight state──▶ Space backend (HF Space CPU)
▲ │
│ ├──▶ Tiny Fighter (142k, <1ms)
│ │ returns move + probs
│ │
│ └──▶ Modal Nemotron (A10, cold start)
│ every ~10 moves:
│ returns strategic weights
│ ▲
└──────────────────weights + reasoning────────┘
Training Pipeline (on Modal A100-40GB)
- SFT Bootstrap — 12k procedural examples teach Nemotron to output strategic weight JSON given fight state.
- Self-Play Rollouts — 100 fights with the SFT model playing both sides. Win/loss outcomes provide reward signals.
- Reward-weighted fine-tuning — positive-reward completions are reinforced, negative-reward completions suppressed. 3 epochs, A100-40GB.
The Tiny Fighter
- ~142k parameter MLP with BatchNorm, trained on 20k procedurally generated (state, strategy) → move examples.
- Runs on CPU in < 1ms per inference. Real-time safe.
- Conditioned on Nemotron's strategic weights, so it adapts its style (aggressive vs. defensive vs. grappling) on the fly.
- 15-move output vocabulary: jab, cross, hook, kick, uppercut, block, parry, dodge, advance, retreat, grapple, throw, sweep, feint, wait.
Badges Targeted
- ✅ Tiny Titan — the 142k param model is genuinely tiny and does real work
- ✅ Well-Tuned — the Nemotron LoRA adapter is published at sankalphs/duel-nemotron-strategist
- ✅ Off-Brand — custom React + Three.js 3D fighting game (not default Gradio)
- ✅ Field Notes — see blog post
- ✅ Modal Award — training and inference both run on Modal
- ✅ Nemotron Quest — fine-tuned Nemotron 3 Nano 4B for the fight
Local Dev
# Frontend
cd 3d-game && npm install && npm run build
# Space backend (CPU)
pip install -r requirements.txt
python app.py
Set MODEL_SERVER env var to your Modal inference endpoint to enable
Nemotron strategy. Without it, the Space falls back to balanced defaults.
Links
- Fine-tuned adapter: https://huggingface.co/sankalphs/duel-nemotron-strategist
- Modal orchestration: see
modal/app.pyin the repo - Demo video: see social post
- Social post: see social post
Built for the Build Small Hackathon by @sankalphs. 🍄