--- title: Tabras emoji: 🃏 colorFrom: purple colorTo: indigo sdk: gradio sdk_version: 6.17.3 app_file: app_hf.py suggested_hardware: cpu-basic pinned: false license: mit tags: - track:wood - sponsor:openbmb - sponsor:openai - sponsor:nvidia - sponsor:modal - achievement:offgrid - achievement:offbrand - achievement:llama - achievement:fieldnotes models: - openbmb/MiniCPM-V-4 - nvidia/Nemotron-Mini-4B-Instruct - stabilityai/sdxl-turbo short_description: 'Tabras is a rouge-lite deckbuilder where you fight AI ' --- # Tabras Tabras is a small-model roguelite card duel built for the Build Small Hackathon's **An Adventure in Thousand Token Wood** track. You name a challenger, choose a world, choose a school of magic, draft a 15-card deck, and fight an AI boss across a short tactical duel. The fun part is that the draft is not a static card list: each run asks a small model to author new card names, flavor, effects, and art direction around your chosen theme and the deck you are already building. The engine keeps the game fair. The model invents the card identity; deterministic Python code prices and resolves the card mechanics. ## Build Small Submission Track: **Thousand Token Wood**. Targeted prizes and badges: - **Best MiniCPM Build:** MiniCPM authors the generated draft cards. - **Nemotron Hardware Prize:** Nemotron drives the autonomous boss player. - **Best Use of Modal:** the submitted Space calls Modal GPU endpoints for model inference. - **Best Use of Codex:** Codex was used during development, with the project connected through the repo/Space workflow. - **Off Brand:** Tabras uses a custom Gradio interface rather than the default Gradio look. - **Tiny Titan:** the submitted runtime uses models in the 4B-and-under class. - **Best Agent:** the boss is an agentic Nemotron player that observes public duel state, selects playable card indexes through a constrained JSON action schema, and executes actions inside the deterministic game engine. - **Best Demo:** demo video and social post links are listed below. - **Bonus Quest Champion:** Tabras combines multiple sponsor criteria and bonus badges in one submission. Demo video: https://youtu.be/qHuk9XjaFWU Social post: https://x.com/yewzoid/status/2066647997740691678?s=20 ## Field Notes — What I Learned Full write-up in [FIELD_NOTES.md](FIELD_NOTES.md). The short version: - **ZeroGPU is a GPU-*sharing* mechanism, not a hosted GPU provider.** I found this out late. ZeroGPU time-slices a shared GPU inside `@spaces.GPU` calls and pickles args/returns between processes — which breaks on `trust_remote_code` models and diffusers pipelines (unpicklable), and forbids CUDA init in the main process. I had to **re-architect at the last minute**: make the Space a thin HTTP client and move every model onto **Modal** GPU endpoints. It still runs **fully local / off-grid** through a `MODE` switch (in-process Transformers/Diffusers, or a local `llama.cpp` server for MiniCPM) — small-and-local was always the point. - **Small models are surprisingly capable, with sharp edges.** SDXL-Turbo makes genuinely striking art in ~4 steps; Nemotron was impressive at agentic, tool-calling boss play from a constrained JSON action schema. MiniCPM owns *meaning* (names, flavor) but not *structure* — the biggest fix was reordering the requested JSON so `effects`/`name` come first and survive a token cutoff. - **Perceived latency beats raw latency.** A minimum loading window per draft pick (a uniform "forging" beat) hides the slow packs behind the same animation as the fast ones; prefetching every branch during idle screens makes picks feel instant; and pre-baking the fixed backbone-card art means it never shimmers. - **Generative card-game design is hard and a ton of fun.** The principle that made it tractable: *the LLM owns meaning, the engine owns math* — the model invents the card, deterministic code prices every number, so cards are balanced by construction. - **Ambitious for the deadline, and I'm happy with it.** Three small models, a custom UI, a compute re-architecture, and a real game loop. Treating the demo video as the deliverable — and optimizing the local recording surface — is what made it land. ## What Makes It AI-Native - **MiniCPM authors draft cards.** It proposes card concepts, names, flavor text, and effect shapes for the current deck. - **The draft is deck-aware.** Every pack is generated against the deck you are already building and your anchor picks. MiniCPM reads your emerging strategy and shapes each pack toward a coherent build — and will sometimes dangle a tempting off-archetype card to test whether you stay disciplined or chase the splash. The draft feels authored *for you*, not pulled from a static random table. - **Nemotron plays the boss.** The boss reads the public board state and chooses from its hidden hand. - **SDXL-Turbo illustrates cards.** Card art is generated lazily so the draft can remain playable while images arrive. - **The rules engine owns the numbers.** Damage, block, burn, ward, and tempo values are assigned by deterministic budget code rather than raw model output. That split is the core design: AI provides surprise and taste, while the engine preserves balance. ## How To Play 1. Click **Play Now**. 2. Enter your name. 3. Choose a background world: Dark Fantasy, Cyberpunk, or Anime. 4. Choose a school of magic: Fire, Ice, or Earth. 5. Read the short rules page while the first draft pack starts loading. 6. Draft 9 generated cards onto a 6-card starter backbone. 7. Duel the boss. ### Schools - **Fire** is pressure: direct damage, burn, bombs, and fast finishers. - **Ice** is tempo: initiative, vulnerable windows, multi-hit pressure, and burst timing. - **Earth** is control: block, ward, shield charge, and delayed counterpunches. ## Model Stack All listed models are under the hackathon's 32B parameter limit. The submitted configuration uses models in the 4B-and-under class for Tiny Titan consideration. | Role | Default model | Size class | Use | | --- | --- | --- | --- | | Card author | `openbmb/MiniCPM-V-4` | 4B-and-under class | Draft pack text and card concepts | | Boss agent | `nvidia/Nemotron-Mini-4B-Instruct` | 4B | Enemy play decisions | | Art | `stabilityai/sdxl-turbo` | 4B-and-under class | Fast card illustration | The Hugging Face Space entry point is [app_hf.py](app_hf.py). The Space is a thin client: MiniCPM (cards), Nemotron (boss), and SDXL-Turbo (art) run on dedicated Modal GPU endpoints (see [modal_app.py](modal_app.py)), which the Space calls over HTTP. This keeps heavy compute off the Space (free CPU hardware) and gives each model its own autoscaled GPU. Tabras also runs locally. By default, [app.py](app.py) can launch without model servers and use deterministic fallback generation. For local AI, [launch_ai.py](launch_ai.py) starts a local MiniCPM llama.cpp server, and the runtime can use local Transformers, MLX, and Diffusers backends through environment variables. ## Running Locally Install dependencies: ```bash python3 -m pip install -r requirements.txt ``` Run the Gradio app without model servers: ```bash python3 app.py ``` That path is fully local and playable: it uses the same deterministic engine and fallback card generation, with no model server required. For local AI card generation, install `llama-server` from llama.cpp, download the MiniCPM GGUF, and place it here: ```text models/minicpm-v-4.6-gguf/MiniCPM-V-4.6-Q4_K_M.gguf ``` Then start Tabras through: ```bash python3 launch_ai.py ``` `launch_ai.py` starts `llama-server` on `127.0.0.1:8090`, points Tabras at that OpenAI-compatible endpoint, uses local MLX for the Nemotron boss by default, and uses local Diffusers for SDXL-Turbo art. You can also point any model role at a GPU machine you control. Set the endpoint/model environment variables before launching the app: ```bash TABRAS_CARD_BACKEND=llamacpp TABRAS_CARD_ENDPOINT=http://YOUR_GPU_HOST:8090/v1/chat/completions TABRAS_CARD_MODEL=minicpm-v-4.6-q4 TABRAS_AI_BOSS=1 TABRAS_BOSS_BACKEND=openai TABRAS_BOSS_ENDPOINT=http://YOUR_GPU_HOST:8081/v1/chat/completions TABRAS_ART_BACKEND=modal TABRAS_ART_ENDPOINT=http://YOUR_GPU_HOST:8082/generate python3 app.py ``` For in-process local Transformers/Diffusers instead of OpenAI-compatible endpoints: ```bash TABRAS_CARD_BACKEND=transformers TABRAS_CARD_MODEL=openbmb/MiniCPM-V-4 TABRAS_AI_BOSS=1 TABRAS_BOSS_BACKEND=transformers TABRAS_BOSS_MODEL=nvidia/Nemotron-Mini-4B-Instruct TABRAS_ART_BACKEND=diffusers TABRAS_ART_MODEL=stabilityai/sdxl-turbo python3 app_hf.py ``` The submitted Space uses Modal GPU endpoints, but the same app can run with local CPU fallback, local model processes, in-process local models, or GPU endpoints that you configure. ## Project Structure | File | Purpose | | --- | --- | | `app.py` | Gradio UI and interaction flow | | `app_hf.py` | Hugging Face Space entry point | | `primitives.py` | Fixed vocabulary of card effects | | `budget.py` | Deterministic card costing | | `generator.py` | Card authoring and model payload handling | | `game.py` | Deterministic combat engine | | `boss.py` | Boss decision layer | | `ui.py` | Draft and battle rendering helpers | | `art.py` | Art generation client | | `forge.py` | Background generation queue |