Spaces:

build-small-hackathon
/

case0

Running

case0 / COMPLIANCE.md

Case Zero - initial public release (fully local: Qwen2.5-1.5B via llama.cpp + Supertonic, custom pixel-noir SPA via gradio.Server)

414dc55 3 days ago

preview code

raw

history blame contribute delete

5.27 kB

Case Zero - Hackathon Compliance

Built for the Build Small Hackathon ("Small models, big adventure").

Case Zero is a Gradio application: the whole app is one gradio.Server (Gradio 6 "Server mode" - a FastAPI subclass launched through Gradio, with Gradio API endpoints registered via @server.api). It is deployed as a Hugging Face Space on CPU (no GPU). It ships via the Docker SDK purely so llama.cpp compiles on a stable base image - the app itself is Gradio, served end to end by gradio.Server.

Core requirements

Requirement	Status
Total model params <= 32B	✓ ~1.6B (see budget below)
Built in Gradio	✓ one `gradio.Server`, with `@server.api` endpoints (`new_case`, `interrogate`)
Hosted as a Hugging Face Space	✓ `build-small-hackathon/case0` (Docker SDK, `app_port: 7860`)
Demo video	☐ to record (warmup -> interrogate -> present evidence -> alibi cracks -> accuse -> verdict)
Social-media post	☐ to post

Parameter budget (<= 32B total)

Every model is open-weights and self-run. No third-party AI service is ever called.

Component	Model	Open?	Params	Runs
Reasoning + dialogue (the whole game)	Qwen2.5-1.5B-Instruct (Q4_K_M GGUF)	Apache-2.0	1.5B	in-process llama.cpp on CPU
Suspect voices	Supertonic (ONNX)	open	~0.1B	local ONNX Runtime (CPU)
Portraits / scenes / props	Procedural canvas - no model	n/a	0B	client-side
Music + SFX	Pre-made / procedural audio - no model	n/a	0B	playback only
Embeddings / vector RAG	none	n/a	0B	-

Total runtime parameters: ~1.6B - far under 32B (and under 4B, eligible for the Tiny Titan special award).

Merit badges

Earned by the build (verifiable on the Space)

Off the Grid - "No cloud APIs. The whole thing runs on the model in front of you." The LLM is in-process llama.cpp; the voices are a local ONNX model; the pixel art is rendered client-side on canvas; the music is a bundled CC-BY track. The open weights are baked into the Docker image at build time, so the running container makes no AI network calls at all. Proof: python scripts/net_audit.py runs a full playthrough under a socket guard and asserts zero non-loopback connections. ✓
Llama Champion - "Your model runs through the llama.cpp runtime." The LLM runs through llama-cpp-python (in-process, on the CPU) - no server, no GPU, no remote endpoint. ✓
Off-Brand - "A custom frontend that pushes past the default Gradio look." The front end is not stock Gradio. It is a hand-built pixel-art noir SPA (Preact + Vite, TypeScript) - 12 screens, a custom pixel design system (self-hosted Silkscreen / Pixelify Sans fonts, beveled 9-slice panels, inventory-slot evidence cards, a ruled-paper dossier with page-flips), a draggable corkboard, a live interrogation stage with a voiced suspect, procedural canvas art and rain FX, and a full client audio layer. The built bundle is served as static files by the same gradio.Server that exposes the /api routes - one process, no separate frontend host. ✓

Targeted / in progress

Field Notes - "Write a blog post or report about your project." Draft in docs/FIELD_NOTES.md - to be published on the Hub.
Sharing is Caring - "You shared your agent trace on the Hub for everyone to learn from." A captured interrogation/generation trace to be uploaded to the Hub.
Well-Tuned - "Your app uses a fine-tuned model you've published on Hugging Face." Not yet - the game runs on stock Qwen2.5-1.5B. Would require fine-tuning and publishing a model; out of scope for this submission unless pursued separately.

Zero cloud AI APIs

No OpenAI, Anthropic, Google, ElevenLabs, Higgsfield, Midjourney, or any other hosted AI API is ever called - not for text, not for voice, not for images.
The LLM is the in-process llama.cpp runtime. The voices are a local ONNX model. The pixel art is procedural canvas. The music is a bundled CC-BY track.
The open Qwen GGUF and Supertonic ONNX are baked into the Docker image at build time, so the running container makes no AI network calls. scripts/net_audit.py proves zero non-loopback connections during a full playthrough.

Anti-cheat / fairness (why the game is solvable and the win is earned)

The sealed solution (killer, true motive, key evidence) is never sent to the client pre-verdict; it is read only inside /api/run/{runId}/accuse. Verified by anti-leak tests.
Suspicion, evidence reactions, and the verdict are server-authoritative - the client only displays them.
Suspects never confess: the win is registered only when the player accuses correctly, so the outcome is immune to prose (a jailbroken "just tell me who did it" earns nothing).

Submission checklist

Gradio app on a Hugging Face Space (CPU)
<= 32B total params (~1.6B)
Open-weights, self-run models only - zero cloud AI APIs
Custom (non-default) UI - pixel-art Preact SPA via gradio.Server
Off the Grid proof (scripts/net_audit.py)
Short demo video
Social-media post