Spaces:
Running
A newer version of the Gradio SDK is available: 6.19.0
title: Tabras
emoji: π
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 6.17.3
app_file: app_hf.py
suggested_hardware: cpu-basic
pinned: false
license: mit
tags:
- track:wood
- sponsor:openbmb
- sponsor:openai
- sponsor:nvidia
- sponsor:modal
- achievement:offgrid
- achievement:offbrand
- achievement:llama
- achievement:fieldnotes
models:
- openbmb/MiniCPM-V-4
- nvidia/Nemotron-Mini-4B-Instruct
- stabilityai/sdxl-turbo
short_description: 'Tabras is a rouge-lite deckbuilder where you fight AI '
Tabras
Tabras is a small-model roguelite card duel built for the Build Small Hackathon's An Adventure in Thousand Token Wood track.
You name a challenger, choose a world, choose a school of magic, draft a 15-card deck, and fight an AI boss across a short tactical duel. The fun part is that the draft is not a static card list: each run asks a small model to author new card names, flavor, effects, and art direction around your chosen theme and the deck you are already building.
The engine keeps the game fair. The model invents the card identity; deterministic Python code prices and resolves the card mechanics.
Build Small Submission
Track: Thousand Token Wood.
Targeted prizes and badges:
- Best MiniCPM Build: MiniCPM authors the generated draft cards.
- Nemotron Hardware Prize: Nemotron drives the autonomous boss player.
- Best Use of Modal: the submitted Space calls Modal GPU endpoints for model inference.
- Best Use of Codex: Codex was used during development, with the project connected through the repo/Space workflow.
- Off Brand: Tabras uses a custom Gradio interface rather than the default Gradio look.
- Tiny Titan: the submitted runtime uses models in the 4B-and-under class.
- Best Agent: the boss is an agentic Nemotron player that observes public duel state, selects playable card indexes through a constrained JSON action schema, and executes actions inside the deterministic game engine.
- Best Demo: demo video and social post links are listed below.
- Bonus Quest Champion: Tabras combines multiple sponsor criteria and bonus badges in one submission.
Demo video: https://youtu.be/qHuk9XjaFWU
Social post: https://x.com/yewzoid/status/2066647997740691678?s=20
Field Notes β What I Learned
Full write-up in FIELD_NOTES.md. The short version:
- ZeroGPU is a GPU-sharing mechanism, not a hosted GPU provider. I found this out
late. ZeroGPU time-slices a shared GPU inside
@spaces.GPUcalls and pickles args/returns between processes β which breaks ontrust_remote_codemodels and diffusers pipelines (unpicklable), and forbids CUDA init in the main process. I had to re-architect at the last minute: make the Space a thin HTTP client and move every model onto Modal GPU endpoints. It still runs fully local / off-grid through aMODEswitch (in-process Transformers/Diffusers, or a localllama.cppserver for MiniCPM) β small-and-local was always the point. - Small models are surprisingly capable, with sharp edges. SDXL-Turbo makes genuinely
striking art in ~4 steps; Nemotron was impressive at agentic, tool-calling boss play
from a constrained JSON action schema. MiniCPM owns meaning (names, flavor) but not
structure β the biggest fix was reordering the requested JSON so
effects/namecome first and survive a token cutoff. - Perceived latency beats raw latency. A minimum loading window per draft pick (a uniform "forging" beat) hides the slow packs behind the same animation as the fast ones; prefetching every branch during idle screens makes picks feel instant; and pre-baking the fixed backbone-card art means it never shimmers.
- Generative card-game design is hard and a ton of fun. The principle that made it tractable: the LLM owns meaning, the engine owns math β the model invents the card, deterministic code prices every number, so cards are balanced by construction.
- Ambitious for the deadline, and I'm happy with it. Three small models, a custom UI, a compute re-architecture, and a real game loop. Treating the demo video as the deliverable β and optimizing the local recording surface β is what made it land.
What Makes It AI-Native
- MiniCPM authors draft cards. It proposes card concepts, names, flavor text, and effect shapes for the current deck.
- The draft is deck-aware. Every pack is generated against the deck you are already building and your anchor picks. MiniCPM reads your emerging strategy and shapes each pack toward a coherent build β and will sometimes dangle a tempting off-archetype card to test whether you stay disciplined or chase the splash. The draft feels authored for you, not pulled from a static random table.
- Nemotron plays the boss. The boss reads the public board state and chooses from its hidden hand.
- SDXL-Turbo illustrates cards. Card art is generated lazily so the draft can remain playable while images arrive.
- The rules engine owns the numbers. Damage, block, burn, ward, and tempo values are assigned by deterministic budget code rather than raw model output.
That split is the core design: AI provides surprise and taste, while the engine preserves balance.
How To Play
- Click Play Now.
- Enter your name.
- Choose a background world: Dark Fantasy, Cyberpunk, or Anime.
- Choose a school of magic: Fire, Ice, or Earth.
- Read the short rules page while the first draft pack starts loading.
- Draft 9 generated cards onto a 6-card starter backbone.
- Duel the boss.
Schools
- Fire is pressure: direct damage, burn, bombs, and fast finishers.
- Ice is tempo: initiative, vulnerable windows, multi-hit pressure, and burst timing.
- Earth is control: block, ward, shield charge, and delayed counterpunches.
Model Stack
All listed models are under the hackathon's 32B parameter limit. The submitted configuration uses models in the 4B-and-under class for Tiny Titan consideration.
| Role | Default model | Size class | Use |
|---|---|---|---|
| Card author | openbmb/MiniCPM-V-4 |
4B-and-under class | Draft pack text and card concepts |
| Boss agent | nvidia/Nemotron-Mini-4B-Instruct |
4B | Enemy play decisions |
| Art | stabilityai/sdxl-turbo |
4B-and-under class | Fast card illustration |
The Hugging Face Space entry point is app_hf.py. The Space is a thin client: MiniCPM (cards), Nemotron (boss), and SDXL-Turbo (art) run on dedicated Modal GPU endpoints (see modal_app.py), which the Space calls over HTTP. This keeps heavy compute off the Space (free CPU hardware) and gives each model its own autoscaled GPU.
Tabras also runs locally. By default, app.py can launch without model servers and use deterministic fallback generation. For local AI, launch_ai.py starts a local MiniCPM llama.cpp server, and the runtime can use local Transformers, MLX, and Diffusers backends through environment variables.
Running Locally
Install dependencies:
python3 -m pip install -r requirements.txt
Run the Gradio app without model servers:
python3 app.py
That path is fully local and playable: it uses the same deterministic engine and fallback card generation, with no model server required.
For local AI card generation, install llama-server from llama.cpp, download the MiniCPM GGUF, and place it here:
models/minicpm-v-4.6-gguf/MiniCPM-V-4.6-Q4_K_M.gguf
Then start Tabras through:
python3 launch_ai.py
launch_ai.py starts llama-server on 127.0.0.1:8090, points Tabras at that OpenAI-compatible endpoint, uses local MLX for the Nemotron boss by default, and uses local Diffusers for SDXL-Turbo art.
You can also point any model role at a GPU machine you control. Set the endpoint/model environment variables before launching the app:
TABRAS_CARD_BACKEND=llamacpp
TABRAS_CARD_ENDPOINT=http://YOUR_GPU_HOST:8090/v1/chat/completions
TABRAS_CARD_MODEL=minicpm-v-4.6-q4
TABRAS_AI_BOSS=1
TABRAS_BOSS_BACKEND=openai
TABRAS_BOSS_ENDPOINT=http://YOUR_GPU_HOST:8081/v1/chat/completions
TABRAS_ART_BACKEND=modal
TABRAS_ART_ENDPOINT=http://YOUR_GPU_HOST:8082/generate
python3 app.py
For in-process local Transformers/Diffusers instead of OpenAI-compatible endpoints:
TABRAS_CARD_BACKEND=transformers
TABRAS_CARD_MODEL=openbmb/MiniCPM-V-4
TABRAS_AI_BOSS=1
TABRAS_BOSS_BACKEND=transformers
TABRAS_BOSS_MODEL=nvidia/Nemotron-Mini-4B-Instruct
TABRAS_ART_BACKEND=diffusers
TABRAS_ART_MODEL=stabilityai/sdxl-turbo
python3 app_hf.py
The submitted Space uses Modal GPU endpoints, but the same app can run with local CPU fallback, local model processes, in-process local models, or GPU endpoints that you configure.
Project Structure
| File | Purpose |
|---|---|
app.py |
Gradio UI and interaction flow |
app_hf.py |
Hugging Face Space entry point |
primitives.py |
Fixed vocabulary of card effects |
budget.py |
Deterministic card costing |
generator.py |
Card authoring and model payload handling |
game.py |
Deterministic combat engine |
boss.py |
Boss decision layer |
ui.py |
Draft and battle rendering helpers |
art.py |
Art generation client |
forge.py |
Background generation queue |