Spaces:
Sleeping
Sleeping
| title: Tabras | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 6.17.3 | |
| app_file: app_hf.py | |
| suggested_hardware: cpu-basic | |
| pinned: false | |
| license: mit | |
| tags: | |
| - track:wood | |
| - sponsor:openbmb | |
| - sponsor:openai | |
| - sponsor:nvidia | |
| - sponsor:modal | |
| - achievement:offgrid | |
| - achievement:offbrand | |
| - achievement:llama | |
| - achievement:fieldnotes | |
| models: | |
| - openbmb/MiniCPM-V-4 | |
| - nvidia/Nemotron-Mini-4B-Instruct | |
| - stabilityai/sdxl-turbo | |
| short_description: 'Tabras is a rouge-lite deckbuilder where you fight AI ' | |
| # Tabras | |
| Tabras is a small-model roguelite card duel built for the Build Small Hackathon's **An Adventure in Thousand Token Wood** track. | |
| You name a challenger, choose a world, choose a school of magic, draft a 15-card deck, and fight an AI boss across a short tactical duel. The fun part is that the draft is not a static card list: each run asks a small model to author new card names, flavor, effects, and art direction around your chosen theme and the deck you are already building. | |
| The engine keeps the game fair. The model invents the card identity; deterministic Python code prices and resolves the card mechanics. | |
| ## Build Small Submission | |
| Track: **Thousand Token Wood**. | |
| Targeted prizes and badges: | |
| - **Best MiniCPM Build:** MiniCPM authors the generated draft cards. | |
| - **Nemotron Hardware Prize:** Nemotron drives the autonomous boss player. | |
| - **Best Use of Modal:** the submitted Space calls Modal GPU endpoints for model inference. | |
| - **Best Use of Codex:** Codex was used during development, with the project connected through the repo/Space workflow. | |
| - **Off Brand:** Tabras uses a custom Gradio interface rather than the default Gradio look. | |
| - **Tiny Titan:** the submitted runtime uses models in the 4B-and-under class. | |
| - **Best Agent:** the boss is an agentic Nemotron player that observes public duel state, selects playable card indexes through a constrained JSON action schema, and executes actions inside the deterministic game engine. | |
| - **Best Demo:** demo video and social post links are listed below. | |
| - **Bonus Quest Champion:** Tabras combines multiple sponsor criteria and bonus badges in one submission. | |
| Demo video: https://youtu.be/qHuk9XjaFWU | |
| Social post: https://x.com/yewzoid/status/2066647997740691678?s=20 | |
| ## Field Notes β What I Learned | |
| Full write-up in [FIELD_NOTES.md](FIELD_NOTES.md). The short version: | |
| - **ZeroGPU is a GPU-*sharing* mechanism, not a hosted GPU provider.** I found this out | |
| late. ZeroGPU time-slices a shared GPU inside `@spaces.GPU` calls and pickles | |
| args/returns between processes β which breaks on `trust_remote_code` models and | |
| diffusers pipelines (unpicklable), and forbids CUDA init in the main process. I had to | |
| **re-architect at the last minute**: make the Space a thin HTTP client and move every | |
| model onto **Modal** GPU endpoints. It still runs **fully local / off-grid** through a | |
| `MODE` switch (in-process Transformers/Diffusers, or a local `llama.cpp` server for | |
| MiniCPM) β small-and-local was always the point. | |
| - **Small models are surprisingly capable, with sharp edges.** SDXL-Turbo makes genuinely | |
| striking art in ~4 steps; Nemotron was impressive at agentic, tool-calling boss play | |
| from a constrained JSON action schema. MiniCPM owns *meaning* (names, flavor) but not | |
| *structure* β the biggest fix was reordering the requested JSON so `effects`/`name` | |
| come first and survive a token cutoff. | |
| - **Perceived latency beats raw latency.** A minimum loading window per draft pick (a | |
| uniform "forging" beat) hides the slow packs behind the same animation as the fast | |
| ones; prefetching every branch during idle screens makes picks feel instant; and | |
| pre-baking the fixed backbone-card art means it never shimmers. | |
| - **Generative card-game design is hard and a ton of fun.** The principle that made it | |
| tractable: *the LLM owns meaning, the engine owns math* β the model invents the card, | |
| deterministic code prices every number, so cards are balanced by construction. | |
| - **Ambitious for the deadline, and I'm happy with it.** Three small models, a custom UI, | |
| a compute re-architecture, and a real game loop. Treating the demo video as the | |
| deliverable β and optimizing the local recording surface β is what made it land. | |
| ## What Makes It AI-Native | |
| - **MiniCPM authors draft cards.** It proposes card concepts, names, flavor text, and effect shapes for the current deck. | |
| - **The draft is deck-aware.** Every pack is generated against the deck you are already building and your anchor picks. MiniCPM reads your emerging strategy and shapes each pack toward a coherent build β and will sometimes dangle a tempting off-archetype card to test whether you stay disciplined or chase the splash. The draft feels authored *for you*, not pulled from a static random table. | |
| - **Nemotron plays the boss.** The boss reads the public board state and chooses from its hidden hand. | |
| - **SDXL-Turbo illustrates cards.** Card art is generated lazily so the draft can remain playable while images arrive. | |
| - **The rules engine owns the numbers.** Damage, block, burn, ward, and tempo values are assigned by deterministic budget code rather than raw model output. | |
| That split is the core design: AI provides surprise and taste, while the engine preserves balance. | |
| ## How To Play | |
| 1. Click **Play Now**. | |
| 2. Enter your name. | |
| 3. Choose a background world: Dark Fantasy, Cyberpunk, or Anime. | |
| 4. Choose a school of magic: Fire, Ice, or Earth. | |
| 5. Read the short rules page while the first draft pack starts loading. | |
| 6. Draft 9 generated cards onto a 6-card starter backbone. | |
| 7. Duel the boss. | |
| ### Schools | |
| - **Fire** is pressure: direct damage, burn, bombs, and fast finishers. | |
| - **Ice** is tempo: initiative, vulnerable windows, multi-hit pressure, and burst timing. | |
| - **Earth** is control: block, ward, shield charge, and delayed counterpunches. | |
| ## Model Stack | |
| All listed models are under the hackathon's 32B parameter limit. The submitted configuration uses models in the 4B-and-under class for Tiny Titan consideration. | |
| | Role | Default model | Size class | Use | | |
| | --- | --- | --- | --- | | |
| | Card author | `openbmb/MiniCPM-V-4` | 4B-and-under class | Draft pack text and card concepts | | |
| | Boss agent | `nvidia/Nemotron-Mini-4B-Instruct` | 4B | Enemy play decisions | | |
| | Art | `stabilityai/sdxl-turbo` | 4B-and-under class | Fast card illustration | | |
| The Hugging Face Space entry point is [app_hf.py](app_hf.py). The Space is a thin client: MiniCPM (cards), Nemotron (boss), and SDXL-Turbo (art) run on dedicated Modal GPU endpoints (see [modal_app.py](modal_app.py)), which the Space calls over HTTP. This keeps heavy compute off the Space (free CPU hardware) and gives each model its own autoscaled GPU. | |
| Tabras also runs locally. By default, [app.py](app.py) can launch without model servers and use deterministic fallback generation. For local AI, [launch_ai.py](launch_ai.py) starts a local MiniCPM llama.cpp server, and the runtime can use local Transformers, MLX, and Diffusers backends through environment variables. | |
| ## Running Locally | |
| Install dependencies: | |
| ```bash | |
| python3 -m pip install -r requirements.txt | |
| ``` | |
| Run the Gradio app without model servers: | |
| ```bash | |
| python3 app.py | |
| ``` | |
| That path is fully local and playable: it uses the same deterministic engine and fallback card generation, with no model server required. | |
| For local AI card generation, install `llama-server` from llama.cpp, download the MiniCPM GGUF, and place it here: | |
| ```text | |
| models/minicpm-v-4.6-gguf/MiniCPM-V-4.6-Q4_K_M.gguf | |
| ``` | |
| Then start Tabras through: | |
| ```bash | |
| python3 launch_ai.py | |
| ``` | |
| `launch_ai.py` starts `llama-server` on `127.0.0.1:8090`, points Tabras at that OpenAI-compatible endpoint, uses local MLX for the Nemotron boss by default, and uses local Diffusers for SDXL-Turbo art. | |
| You can also point any model role at a GPU machine you control. Set the endpoint/model environment variables before launching the app: | |
| ```bash | |
| TABRAS_CARD_BACKEND=llamacpp | |
| TABRAS_CARD_ENDPOINT=http://YOUR_GPU_HOST:8090/v1/chat/completions | |
| TABRAS_CARD_MODEL=minicpm-v-4.6-q4 | |
| TABRAS_AI_BOSS=1 | |
| TABRAS_BOSS_BACKEND=openai | |
| TABRAS_BOSS_ENDPOINT=http://YOUR_GPU_HOST:8081/v1/chat/completions | |
| TABRAS_ART_BACKEND=modal | |
| TABRAS_ART_ENDPOINT=http://YOUR_GPU_HOST:8082/generate | |
| python3 app.py | |
| ``` | |
| For in-process local Transformers/Diffusers instead of OpenAI-compatible endpoints: | |
| ```bash | |
| TABRAS_CARD_BACKEND=transformers | |
| TABRAS_CARD_MODEL=openbmb/MiniCPM-V-4 | |
| TABRAS_AI_BOSS=1 | |
| TABRAS_BOSS_BACKEND=transformers | |
| TABRAS_BOSS_MODEL=nvidia/Nemotron-Mini-4B-Instruct | |
| TABRAS_ART_BACKEND=diffusers | |
| TABRAS_ART_MODEL=stabilityai/sdxl-turbo | |
| python3 app_hf.py | |
| ``` | |
| The submitted Space uses Modal GPU endpoints, but the same app can run with local CPU fallback, local model processes, in-process local models, or GPU endpoints that you configure. | |
| ## Project Structure | |
| | File | Purpose | | |
| | --- | --- | | |
| | `app.py` | Gradio UI and interaction flow | | |
| | `app_hf.py` | Hugging Face Space entry point | | |
| | `primitives.py` | Fixed vocabulary of card effects | | |
| | `budget.py` | Deterministic card costing | | |
| | `generator.py` | Card authoring and model payload handling | | |
| | `game.py` | Deterministic combat engine | | |
| | `boss.py` | Boss decision layer | | |
| | `ui.py` | Draft and battle rendering helpers | | |
| | `art.py` | Art generation client | | |
| | `forge.py` | Background generation queue | |