Spaces:

build-small-hackathon
/

tabras

Sleeping

App Files Files Community

tabras / README.md

vvennelakanti

Update README.md

b97e2f2 verified 17 days ago

preview code

Raw

History Blame Contribute Delete

9.35 kB

	---
	title: Tabras
	emoji: 🃏
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.17.3
	app_file: app_hf.py
	suggested_hardware: cpu-basic
	pinned: false
	license: mit
	tags:
	- track:wood
	- sponsor:openbmb
	- sponsor:openai
	- sponsor:nvidia
	- sponsor:modal
	- achievement:offgrid
	- achievement:offbrand
	- achievement:llama
	- achievement:fieldnotes
	models:
	- openbmb/MiniCPM-V-4
	- nvidia/Nemotron-Mini-4B-Instruct
	- stabilityai/sdxl-turbo
	short_description: 'Tabras is a rouge-lite deckbuilder where you fight AI '
	---

	# Tabras

	Tabras is a small-model roguelite card duel built for the Build Small Hackathon's An Adventure in Thousand Token Wood track.

	You name a challenger, choose a world, choose a school of magic, draft a 15-card deck, and fight an AI boss across a short tactical duel. The fun part is that the draft is not a static card list: each run asks a small model to author new card names, flavor, effects, and art direction around your chosen theme and the deck you are already building.

	The engine keeps the game fair. The model invents the card identity; deterministic Python code prices and resolves the card mechanics.

	## Build Small Submission

	Track: Thousand Token Wood.

	Targeted prizes and badges:

	- Best MiniCPM Build: MiniCPM authors the generated draft cards.
	- Nemotron Hardware Prize: Nemotron drives the autonomous boss player.
	- Best Use of Modal: the submitted Space calls Modal GPU endpoints for model inference.
	- Best Use of Codex: Codex was used during development, with the project connected through the repo/Space workflow.
	- Off Brand: Tabras uses a custom Gradio interface rather than the default Gradio look.
	- Tiny Titan: the submitted runtime uses models in the 4B-and-under class.
	- Best Agent: the boss is an agentic Nemotron player that observes public duel state, selects playable card indexes through a constrained JSON action schema, and executes actions inside the deterministic game engine.
	- Best Demo: demo video and social post links are listed below.
	- Bonus Quest Champion: Tabras combines multiple sponsor criteria and bonus badges in one submission.

	Demo video: https://youtu.be/qHuk9XjaFWU

	Social post: https://x.com/yewzoid/status/2066647997740691678?s=20

	## Field Notes — What I Learned

	Full write-up in [FIELD_NOTES.md](FIELD_NOTES.md). The short version:

	- *ZeroGPU is a GPU-sharing* mechanism, not a hosted GPU provider.** I found this out
	late. ZeroGPU time-slices a shared GPU inside `@spaces.GPU` calls and pickles
	args/returns between processes — which breaks on `trust_remote_code` models and
	diffusers pipelines (unpicklable), and forbids CUDA init in the main process. I had to
	re-architect at the last minute: make the Space a thin HTTP client and move every
	model onto Modal GPU endpoints. It still runs fully local / off-grid through a
	`MODE` switch (in-process Transformers/Diffusers, or a local `llama.cpp` server for
	MiniCPM) — small-and-local was always the point.
	- Small models are surprisingly capable, with sharp edges. SDXL-Turbo makes genuinely
	striking art in ~4 steps; Nemotron was impressive at agentic, tool-calling boss play
	from a constrained JSON action schema. MiniCPM owns meaning (names, flavor) but not
	structure — the biggest fix was reordering the requested JSON so `effects`/`name`
	come first and survive a token cutoff.
	- Perceived latency beats raw latency. A minimum loading window per draft pick (a
	uniform "forging" beat) hides the slow packs behind the same animation as the fast
	ones; prefetching every branch during idle screens makes picks feel instant; and
	pre-baking the fixed backbone-card art means it never shimmers.
	- Generative card-game design is hard and a ton of fun. The principle that made it
	tractable: the LLM owns meaning, the engine owns math — the model invents the card,
	deterministic code prices every number, so cards are balanced by construction.
	- Ambitious for the deadline, and I'm happy with it. Three small models, a custom UI,
	a compute re-architecture, and a real game loop. Treating the demo video as the
	deliverable — and optimizing the local recording surface — is what made it land.

	## What Makes It AI-Native

	- MiniCPM authors draft cards. It proposes card concepts, names, flavor text, and effect shapes for the current deck.
	- The draft is deck-aware. Every pack is generated against the deck you are already building and your anchor picks. MiniCPM reads your emerging strategy and shapes each pack toward a coherent build — and will sometimes dangle a tempting off-archetype card to test whether you stay disciplined or chase the splash. The draft feels authored for you, not pulled from a static random table.
	- Nemotron plays the boss. The boss reads the public board state and chooses from its hidden hand.
	- SDXL-Turbo illustrates cards. Card art is generated lazily so the draft can remain playable while images arrive.
	- The rules engine owns the numbers. Damage, block, burn, ward, and tempo values are assigned by deterministic budget code rather than raw model output.

	That split is the core design: AI provides surprise and taste, while the engine preserves balance.

	## How To Play

	1. Click Play Now.
	2. Enter your name.
	3. Choose a background world: Dark Fantasy, Cyberpunk, or Anime.
	4. Choose a school of magic: Fire, Ice, or Earth.
	5. Read the short rules page while the first draft pack starts loading.
	6. Draft 9 generated cards onto a 6-card starter backbone.
	7. Duel the boss.

	### Schools

	- Fire is pressure: direct damage, burn, bombs, and fast finishers.
	- Ice is tempo: initiative, vulnerable windows, multi-hit pressure, and burst timing.
	- Earth is control: block, ward, shield charge, and delayed counterpunches.

	## Model Stack

	All listed models are under the hackathon's 32B parameter limit. The submitted configuration uses models in the 4B-and-under class for Tiny Titan consideration.

	\| Role \| Default model \| Size class \| Use \|
	\| --- \| --- \| --- \| --- \|
	\| Card author \| `openbmb/MiniCPM-V-4` \| 4B-and-under class \| Draft pack text and card concepts \|
	\| Boss agent \| `nvidia/Nemotron-Mini-4B-Instruct` \| 4B \| Enemy play decisions \|
	\| Art \| `stabilityai/sdxl-turbo` \| 4B-and-under class \| Fast card illustration \|

	The Hugging Face Space entry point is [app_hf.py](app_hf.py). The Space is a thin client: MiniCPM (cards), Nemotron (boss), and SDXL-Turbo (art) run on dedicated Modal GPU endpoints (see [modal_app.py](modal_app.py)), which the Space calls over HTTP. This keeps heavy compute off the Space (free CPU hardware) and gives each model its own autoscaled GPU.

	Tabras also runs locally. By default, [app.py](app.py) can launch without model servers and use deterministic fallback generation. For local AI, [launch_ai.py](launch_ai.py) starts a local MiniCPM llama.cpp server, and the runtime can use local Transformers, MLX, and Diffusers backends through environment variables.

	## Running Locally

	Install dependencies:

	```bash
	python3 -m pip install -r requirements.txt
	```

	Run the Gradio app without model servers:

	```bash
	python3 app.py
	```

	That path is fully local and playable: it uses the same deterministic engine and fallback card generation, with no model server required.

	For local AI card generation, install `llama-server` from llama.cpp, download the MiniCPM GGUF, and place it here:

	```text
	models/minicpm-v-4.6-gguf/MiniCPM-V-4.6-Q4_K_M.gguf
	```

	Then start Tabras through:

	```bash
	python3 launch_ai.py
	```

	`launch_ai.py` starts `llama-server` on `127.0.0.1:8090`, points Tabras at that OpenAI-compatible endpoint, uses local MLX for the Nemotron boss by default, and uses local Diffusers for SDXL-Turbo art.

	You can also point any model role at a GPU machine you control. Set the endpoint/model environment variables before launching the app:

	```bash
	TABRAS_CARD_BACKEND=llamacpp
	TABRAS_CARD_ENDPOINT=http://YOUR_GPU_HOST:8090/v1/chat/completions
	TABRAS_CARD_MODEL=minicpm-v-4.6-q4
	TABRAS_AI_BOSS=1
	TABRAS_BOSS_BACKEND=openai
	TABRAS_BOSS_ENDPOINT=http://YOUR_GPU_HOST:8081/v1/chat/completions
	TABRAS_ART_BACKEND=modal
	TABRAS_ART_ENDPOINT=http://YOUR_GPU_HOST:8082/generate
	python3 app.py
	```

	For in-process local Transformers/Diffusers instead of OpenAI-compatible endpoints:

	```bash
	TABRAS_CARD_BACKEND=transformers
	TABRAS_CARD_MODEL=openbmb/MiniCPM-V-4
	TABRAS_AI_BOSS=1
	TABRAS_BOSS_BACKEND=transformers
	TABRAS_BOSS_MODEL=nvidia/Nemotron-Mini-4B-Instruct
	TABRAS_ART_BACKEND=diffusers
	TABRAS_ART_MODEL=stabilityai/sdxl-turbo
	python3 app_hf.py
	```

	The submitted Space uses Modal GPU endpoints, but the same app can run with local CPU fallback, local model processes, in-process local models, or GPU endpoints that you configure.

	## Project Structure

	\| File \| Purpose \|
	\| --- \| --- \|
	\| `app.py` \| Gradio UI and interaction flow \|
	\| `app_hf.py` \| Hugging Face Space entry point \|
	\| `primitives.py` \| Fixed vocabulary of card effects \|
	\| `budget.py` \| Deterministic card costing \|
	\| `generator.py` \| Card authoring and model payload handling \|
	\| `game.py` \| Deterministic combat engine \|
	\| `boss.py` \| Boss decision layer \|
	\| `ui.py` \| Draft and battle rendering helpers \|
	\| `art.py` \| Art generation client \|
	\| `forge.py` \| Background generation queue \|