---
title: Tabras
emoji: 🃏
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.17.3
app_file: app_hf.py
suggested_hardware: cpu-basic
pinned: false
license: mit
tags:
- track:wood
- sponsor:openbmb
- sponsor:openai
- sponsor:nvidia
- sponsor:modal
- achievement:offgrid
- achievement:offbrand
- achievement:llama
- achievement:fieldnotes
models:
- openbmb/MiniCPM-V-4
- nvidia/Nemotron-Mini-4B-Instruct
- stabilityai/sdxl-turbo
short_description: 'Tabras is a rouge-lite deckbuilder where you fight AI '
---

# Tabras

Tabras is a small-model roguelite card duel built for the Build Small Hackathon's **An Adventure in Thousand Token Wood** track.

You name a challenger, choose a world, choose a school of magic, draft a 15-card deck, and fight an AI boss across a short tactical duel. The fun part is that the draft is not a static card list: each run asks a small model to author new card names, flavor, effects, and art direction around your chosen theme and the deck you are already building.

The engine keeps the game fair. The model invents the card identity; deterministic Python code prices and resolves the card mechanics.

## Build Small Submission

Track: **Thousand Token Wood**.

Targeted prizes and badges:

- **Best MiniCPM Build:** MiniCPM authors the generated draft cards.
- **Nemotron Hardware Prize:** Nemotron drives the autonomous boss player.
- **Best Use of Modal:** the submitted Space calls Modal GPU endpoints for model inference.
- **Best Use of Codex:** Codex was used during development, with the project connected through the repo/Space workflow.
- **Off Brand:** Tabras uses a custom Gradio interface rather than the default Gradio look.
- **Tiny Titan:** the submitted runtime uses models in the 4B-and-under class.
- **Best Agent:** the boss is an agentic Nemotron player that observes public duel state, selects playable card indexes through a constrained JSON action schema, and executes actions inside the deterministic game engine.
- **Best Demo:** demo video and social post links are listed below.
- **Bonus Quest Champion:** Tabras combines multiple sponsor criteria and bonus badges in one submission.

Demo video: https://youtu.be/qHuk9XjaFWU

Social post: https://x.com/yewzoid/status/2066647997740691678?s=20

## Field Notes — What I Learned

Full write-up in [FIELD_NOTES.md](FIELD_NOTES.md). The short version:

- **ZeroGPU is a GPU-*sharing* mechanism, not a hosted GPU provider.** I found this out
  late. ZeroGPU time-slices a shared GPU inside `@spaces.GPU` calls and pickles
  args/returns between processes — which breaks on `trust_remote_code` models and
  diffusers pipelines (unpicklable), and forbids CUDA init in the main process. I had to
  **re-architect at the last minute**: make the Space a thin HTTP client and move every
  model onto **Modal** GPU endpoints. It still runs **fully local / off-grid** through a
  `MODE` switch (in-process Transformers/Diffusers, or a local `llama.cpp` server for
  MiniCPM) — small-and-local was always the point.
- **Small models are surprisingly capable, with sharp edges.** SDXL-Turbo makes genuinely
  striking art in ~4 steps; Nemotron was impressive at agentic, tool-calling boss play
  from a constrained JSON action schema. MiniCPM owns *meaning* (names, flavor) but not
  *structure* — the biggest fix was reordering the requested JSON so `effects`/`name`
  come first and survive a token cutoff.
- **Perceived latency beats raw latency.** A minimum loading window per draft pick (a
  uniform "forging" beat) hides the slow packs behind the same animation as the fast
  ones; prefetching every branch during idle screens makes picks feel instant; and
  pre-baking the fixed backbone-card art means it never shimmers.
- **Generative card-game design is hard and a ton of fun.** The principle that made it
  tractable: *the LLM owns meaning, the engine owns math* — the model invents the card,
  deterministic code prices every number, so cards are balanced by construction.
- **Ambitious for the deadline, and I'm happy with it.** Three small models, a custom UI,
  a compute re-architecture, and a real game loop. Treating the demo video as the
  deliverable — and optimizing the local recording surface — is what made it land.

## What Makes It AI-Native

- **MiniCPM authors draft cards.** It proposes card concepts, names, flavor text, and effect shapes for the current deck.
- **The draft is deck-aware.** Every pack is generated against the deck you are already building and your anchor picks. MiniCPM reads your emerging strategy and shapes each pack toward a coherent build — and will sometimes dangle a tempting off-archetype card to test whether you stay disciplined or chase the splash. The draft feels authored *for you*, not pulled from a static random table.
- **Nemotron plays the boss.** The boss reads the public board state and chooses from its hidden hand.
- **SDXL-Turbo illustrates cards.** Card art is generated lazily so the draft can remain playable while images arrive.
- **The rules engine owns the numbers.** Damage, block, burn, ward, and tempo values are assigned by deterministic budget code rather than raw model output.

That split is the core design: AI provides surprise and taste, while the engine preserves balance.

## How To Play

1. Click **Play Now**.
2. Enter your name.
3. Choose a background world: Dark Fantasy, Cyberpunk, or Anime.
4. Choose a school of magic: Fire, Ice, or Earth.
5. Read the short rules page while the first draft pack starts loading.
6. Draft 9 generated cards onto a 6-card starter backbone.
7. Duel the boss.

### Schools

- **Fire** is pressure: direct damage, burn, bombs, and fast finishers.
- **Ice** is tempo: initiative, vulnerable windows, multi-hit pressure, and burst timing.
- **Earth** is control: block, ward, shield charge, and delayed counterpunches.

## Model Stack

All listed models are under the hackathon's 32B parameter limit. The submitted configuration uses models in the 4B-and-under class for Tiny Titan consideration.

| Role | Default model | Size class | Use |
| --- | --- | --- | --- |
| Card author | `openbmb/MiniCPM-V-4` | 4B-and-under class | Draft pack text and card concepts |
| Boss agent | `nvidia/Nemotron-Mini-4B-Instruct` | 4B | Enemy play decisions |
| Art | `stabilityai/sdxl-turbo` | 4B-and-under class | Fast card illustration |

The Hugging Face Space entry point is [app_hf.py](app_hf.py). The Space is a thin client: MiniCPM (cards), Nemotron (boss), and SDXL-Turbo (art) run on dedicated Modal GPU endpoints (see [modal_app.py](modal_app.py)), which the Space calls over HTTP. This keeps heavy compute off the Space (free CPU hardware) and gives each model its own autoscaled GPU.

Tabras also runs locally. By default, [app.py](app.py) can launch without model servers and use deterministic fallback generation. For local AI, [launch_ai.py](launch_ai.py) starts a local MiniCPM llama.cpp server, and the runtime can use local Transformers, MLX, and Diffusers backends through environment variables.

## Running Locally

Install dependencies:

```bash
python3 -m pip install -r requirements.txt
```

Run the Gradio app without model servers:

```bash
python3 app.py
```

That path is fully local and playable: it uses the same deterministic engine and fallback card generation, with no model server required.

For local AI card generation, install `llama-server` from llama.cpp, download the MiniCPM GGUF, and place it here:

```text
models/minicpm-v-4.6-gguf/MiniCPM-V-4.6-Q4_K_M.gguf
```

Then start Tabras through:

```bash
python3 launch_ai.py
```

`launch_ai.py` starts `llama-server` on `127.0.0.1:8090`, points Tabras at that OpenAI-compatible endpoint, uses local MLX for the Nemotron boss by default, and uses local Diffusers for SDXL-Turbo art.

You can also point any model role at a GPU machine you control. Set the endpoint/model environment variables before launching the app:

```bash
TABRAS_CARD_BACKEND=llamacpp
TABRAS_CARD_ENDPOINT=http://YOUR_GPU_HOST:8090/v1/chat/completions
TABRAS_CARD_MODEL=minicpm-v-4.6-q4
TABRAS_AI_BOSS=1
TABRAS_BOSS_BACKEND=openai
TABRAS_BOSS_ENDPOINT=http://YOUR_GPU_HOST:8081/v1/chat/completions
TABRAS_ART_BACKEND=modal
TABRAS_ART_ENDPOINT=http://YOUR_GPU_HOST:8082/generate
python3 app.py
```

For in-process local Transformers/Diffusers instead of OpenAI-compatible endpoints:

```bash
TABRAS_CARD_BACKEND=transformers
TABRAS_CARD_MODEL=openbmb/MiniCPM-V-4
TABRAS_AI_BOSS=1
TABRAS_BOSS_BACKEND=transformers
TABRAS_BOSS_MODEL=nvidia/Nemotron-Mini-4B-Instruct
TABRAS_ART_BACKEND=diffusers
TABRAS_ART_MODEL=stabilityai/sdxl-turbo
python3 app_hf.py
```

The submitted Space uses Modal GPU endpoints, but the same app can run with local CPU fallback, local model processes, in-process local models, or GPU endpoints that you configure.

## Project Structure

| File | Purpose |
| --- | --- |
| `app.py` | Gradio UI and interaction flow |
| `app_hf.py` | Hugging Face Space entry point |
| `primitives.py` | Fixed vocabulary of card effects |
| `budget.py` | Deterministic card costing |
| `generator.py` | Card authoring and model payload handling |
| `game.py` | Deterministic combat engine |
| `boss.py` | Boss decision layer |
| `ui.py` | Draft and battle rendering helpers |
| `art.py` | Art generation client |
| `forge.py` | Background generation queue |