---
title: Duel of Nemotron
emoji: ⚔️
colorFrom: indigo
colorTo: purple
sdk: docker
python_version: "3.11"
app_port: 7860
tags:
  - thousand-token-wood
  - nemotron
  - fine-tuned
  - custom-ui
  - tiny-titan
  - self-play
  - rl
  - fighting-game
  - modal
pinned: false
---

# Duel of Nemotron ⚔️ — Hybrid Self-Play AI Fighter

A cyberpunk fighting game where the AI opponent is powered by a
**two-tier hybrid**:
- **Nemotron 3 Nano 4B** (fine-tuned, on Modal A10) — the **strategist**
- **Tiny Fighter** (~142k params, CPU, in this Space) — the **real-time executor**

Nemotron watches the fight and outputs a *mode* (aggressive / defensive /
grappling / etc.) every several moves. The Tiny Fighter, conditioned on that
mode plus the last few moves, picks the actual next move in < 1ms on CPU.

This is a tiny-model-implements-the-fast-loop + fine-tuned-LLM-sets-the-direction
pattern: a small CPU policy network for real-time play, a larger fine-tuned
model for strategic depth.

## How It Works

```
Browser (React + Three.js)  ──fight state──▶  Space backend (HF Space CPU)
    ▲                                              │
    │                                              ├──▶ Tiny Fighter (142k, <1ms)
    │                                              │     returns move + probs
    │                                              │
    │                                              └──▶ Modal Nemotron (A10, cold start)
    │                                                    every ~10 moves:
    │                                                    returns strategic weights
    │                                              ▲
    └──────────────────weights + reasoning────────┘
```

### Training Pipeline (on Modal A100-40GB)
1. **SFT Bootstrap** — 12k procedural examples teach Nemotron to output
   strategic weight JSON given fight state.
2. **Self-Play Rollouts** — 100 fights with the SFT model playing both sides.
   Win/loss outcomes provide reward signals.
3. **Reward-weighted fine-tuning** — positive-reward completions are reinforced,
   negative-reward completions suppressed. 3 epochs, A100-40GB.

### The Tiny Fighter
- **~142k parameter MLP** with BatchNorm, trained on 20k procedurally
  generated (state, strategy) → move examples.
- Runs on CPU in < 1ms per inference. Real-time safe.
- Conditioned on Nemotron's strategic weights, so it *adapts its style*
  (aggressive vs. defensive vs. grappling) on the fly.
- 15-move output vocabulary: jab, cross, hook, kick, uppercut, block, parry,
  dodge, advance, retreat, grapple, throw, sweep, feint, wait.

## Badges Targeted
- ✅ **Tiny Titan** — the 142k param model is genuinely tiny and does real work
- ✅ **Well-Tuned** — the Nemotron LoRA adapter is published at
  [sankalphs/duel-nemotron-strategist](https://huggingface.co/sankalphs/duel-nemotron-strategist)
- ✅ **Off-Brand** — custom React + Three.js 3D fighting game (not default Gradio)
- ✅ **Field Notes** — see blog post
- ✅ **Modal Award** — training and inference both run on Modal
- ✅ **Nemotron Quest** — fine-tuned Nemotron 3 Nano 4B for the fight

## Local Dev

```bash
# Frontend
cd 3d-game && npm install && npm run build

# Space backend (CPU)
pip install -r requirements.txt
python app.py
```

Set `MODEL_SERVER` env var to your Modal inference endpoint to enable
Nemotron strategy. Without it, the Space falls back to balanced defaults.

## Links

- **Fine-tuned adapter**: https://huggingface.co/sankalphs/duel-nemotron-strategist
- **Modal orchestration**: see `modal/app.py` in the repo
- **Demo video**: _see social post_
- **Social post**: _see social post_

---

Built for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon)
by [@sankalphs](https://huggingface.co/sankalphs). 🍄