Spaces:

build-small-hackathon
/

case0

Running

case0 / COMPLIANCE.md

Case Zero - initial public release (fully local: Qwen2.5-1.5B via llama.cpp + Supertonic, custom pixel-noir SPA via gradio.Server)

414dc55 3 days ago

preview code

raw

history blame contribute delete

5.27 kB

	# Case Zero - Hackathon Compliance

	Built for the Build Small Hackathon ("Small models, big adventure").

	Case Zero is a Gradio application: the whole app is one `gradio.Server` (Gradio 6
	"Server mode" - a FastAPI subclass launched through Gradio, with Gradio API endpoints
	registered via `@server.api`). It is deployed as a Hugging Face Space on CPU (no
	GPU). It ships via the Docker SDK purely so llama.cpp compiles on a stable base image - the
	app itself is Gradio, served end to end by `gradio.Server`.

	## Core requirements

	\| Requirement \| Status \|
	\|---\|---\|
	\| Total model params <= 32B \| ✓ ~1.6B (see budget below) \|
	\| Built in Gradio \| ✓ one `gradio.Server`, with `@server.api` endpoints (`new_case`, `interrogate`) \|
	\| Hosted as a Hugging Face Space \| ✓ `build-small-hackathon/case0` (Docker SDK, `app_port: 7860`) \|
	\| Demo video \| ☐ to record (warmup -> interrogate -> present evidence -> alibi cracks -> accuse -> verdict) \|
	\| Social-media post \| ☐ to post \|

	## Parameter budget (<= 32B total)

	Every model is open-weights and self-run. No third-party AI service is ever called.

	\| Component \| Model \| Open? \| Params \| Runs \|
	\|---\|---\|---\|---\|---\|
	\| Reasoning + dialogue (the whole game) \| Qwen2.5-1.5B-Instruct (Q4_K_M GGUF) \| Apache-2.0 \| 1.5B \| in-process llama.cpp on CPU \|
	\| Suspect voices \| Supertonic (ONNX) \| open \| ~0.1B \| local ONNX Runtime (CPU) \|
	\| Portraits / scenes / props \| Procedural canvas - no model \| n/a \| 0B \| client-side \|
	\| Music + SFX \| Pre-made / procedural audio - no model \| n/a \| 0B \| playback only \|
	\| Embeddings / vector RAG \| none \| n/a \| 0B \| - \|

	Total runtime parameters: ~1.6B - far under 32B (and under 4B, eligible for the
	Tiny Titan special award).

	## Merit badges

	### Earned by the build (verifiable on the Space)

	- Off the Grid - "No cloud APIs. The whole thing runs on the model in front of you."
	The LLM is in-process llama.cpp; the voices are a local ONNX model; the pixel art is
	rendered client-side on canvas; the music is a bundled CC-BY track. The open weights are
	baked into the Docker image at build time, so the running container makes **no AI network
	calls at all**. Proof: `python scripts/net_audit.py` runs a full playthrough under a
	socket guard and asserts zero non-loopback connections. ✓
	- Llama Champion - "Your model runs through the llama.cpp runtime." The LLM runs
	through `llama-cpp-python` (in-process, on the CPU) - no server, no GPU, no remote
	endpoint. ✓
	- Off-Brand - "A custom frontend that pushes past the default Gradio look." The front
	end is not stock Gradio. It is a hand-built **pixel-art noir SPA (Preact + Vite,
	TypeScript)** - 12 screens, a custom pixel design system (self-hosted Silkscreen /
	Pixelify Sans fonts, beveled 9-slice panels, inventory-slot evidence cards, a ruled-paper
	dossier with page-flips), a draggable corkboard, a live interrogation stage with a
	voiced suspect, procedural canvas art and rain FX, and a full client audio layer. The
	built bundle is served as static files by the same `gradio.Server` that exposes the
	`/api` routes - one process, no separate frontend host. ✓

	### Targeted / in progress

	- Field Notes - "Write a blog post or report about your project." Draft in
	[`docs/FIELD_NOTES.md`](docs/FIELD_NOTES.md) - to be published on the Hub.
	- Sharing is Caring - *"You shared your agent trace on the Hub for everyone to learn
	from."* A captured interrogation/generation trace to be uploaded to the Hub.
	- Well-Tuned - "Your app uses a fine-tuned model you've published on Hugging Face."
	Not yet - the game runs on stock Qwen2.5-1.5B. Would require fine-tuning and publishing a
	model; out of scope for this submission unless pursued separately.

	## Zero cloud AI APIs

	- **No OpenAI, Anthropic, Google, ElevenLabs, Higgsfield, Midjourney, or any other hosted
	AI API is ever called** - not for text, not for voice, not for images.
	- The LLM is the in-process llama.cpp runtime. The voices are a local ONNX model. The pixel
	art is procedural canvas. The music is a bundled CC-BY track.
	- The open Qwen GGUF and Supertonic ONNX are baked into the Docker image at build time,
	so the running container makes no AI network calls. `scripts/net_audit.py` proves zero
	non-loopback connections during a full playthrough.

	## Anti-cheat / fairness (why the game is solvable and the win is earned)

	- The sealed solution (killer, true motive, key evidence) is never sent to the client
	pre-verdict; it is read only inside `/api/run/{runId}/accuse`. Verified by anti-leak tests.
	- Suspicion, evidence reactions, and the verdict are server-authoritative - the client
	only displays them.
	- Suspects never confess: the win is registered only when the player accuses correctly,
	so the outcome is immune to prose (a jailbroken "just tell me who did it" earns nothing).

	## Submission checklist

	- [x] Gradio app on a Hugging Face Space (CPU)
	- [x] <= 32B total params (~1.6B)
	- [x] Open-weights, self-run models only - zero cloud AI APIs
	- [x] Custom (non-default) UI - pixel-art Preact SPA via `gradio.Server`
	- [x] Off the Grid proof (`scripts/net_audit.py`)
	- [ ] Short demo video
	- [ ] Social-media post