duel / README.md
sankalphs's picture
Upload README.md with huggingface_hub
996bacd verified
|
Raw
History Blame Contribute Delete
3.92 kB
metadata
title: Duel of Nemotron
emoji: ⚔️
colorFrom: indigo
colorTo: purple
sdk: docker
python_version: '3.11'
app_port: 7860
tags:
  - thousand-token-wood
  - nemotron
  - fine-tuned
  - custom-ui
  - tiny-titan
  - self-play
  - rl
  - fighting-game
  - modal
pinned: false

Duel of Nemotron ⚔️ — Hybrid Self-Play AI Fighter

A cyberpunk fighting game where the AI opponent is powered by a two-tier hybrid:

  • Nemotron 3 Nano 4B (fine-tuned, on Modal A10) — the strategist
  • Tiny Fighter (~142k params, CPU, in this Space) — the real-time executor

Nemotron watches the fight and outputs a mode (aggressive / defensive / grappling / etc.) every several moves. The Tiny Fighter, conditioned on that mode plus the last few moves, picks the actual next move in < 1ms on CPU.

This is a tiny-model-implements-the-fast-loop + fine-tuned-LLM-sets-the-direction pattern: a small CPU policy network for real-time play, a larger fine-tuned model for strategic depth.

How It Works

Browser (React + Three.js)  ──fight state──▶  Space backend (HF Space CPU)
    ▲                                              │
    │                                              ├──▶ Tiny Fighter (142k, <1ms)
    │                                              │     returns move + probs
    │                                              │
    │                                              └──▶ Modal Nemotron (A10, cold start)
    │                                                    every ~10 moves:
    │                                                    returns strategic weights
    │                                              ▲
    └──────────────────weights + reasoning────────┘

Training Pipeline (on Modal A100-40GB)

  1. SFT Bootstrap — 12k procedural examples teach Nemotron to output strategic weight JSON given fight state.
  2. Self-Play Rollouts — 100 fights with the SFT model playing both sides. Win/loss outcomes provide reward signals.
  3. Reward-weighted fine-tuning — positive-reward completions are reinforced, negative-reward completions suppressed. 3 epochs, A100-40GB.

The Tiny Fighter

  • ~142k parameter MLP with BatchNorm, trained on 20k procedurally generated (state, strategy) → move examples.
  • Runs on CPU in < 1ms per inference. Real-time safe.
  • Conditioned on Nemotron's strategic weights, so it adapts its style (aggressive vs. defensive vs. grappling) on the fly.
  • 15-move output vocabulary: jab, cross, hook, kick, uppercut, block, parry, dodge, advance, retreat, grapple, throw, sweep, feint, wait.

Badges Targeted

  • Tiny Titan — the 142k param model is genuinely tiny and does real work
  • Well-Tuned — the Nemotron LoRA adapter is published at sankalphs/duel-nemotron-strategist
  • Off-Brand — custom React + Three.js 3D fighting game (not default Gradio)
  • Field Notes — see blog post
  • Modal Award — training and inference both run on Modal
  • Nemotron Quest — fine-tuned Nemotron 3 Nano 4B for the fight

Local Dev

# Frontend
cd 3d-game && npm install && npm run build

# Space backend (CPU)
pip install -r requirements.txt
python app.py

Set MODEL_SERVER env var to your Modal inference endpoint to enable Nemotron strategy. Without it, the Space falls back to balanced defaults.

Links


Built for the Build Small Hackathon by @sankalphs. 🍄