Spaces:
Running
Running
Case Zero - initial public release (fully local: Qwen2.5-1.5B via llama.cpp + Supertonic, custom pixel-noir SPA via gradio.Server)
414dc55 Case Zero - Hackathon Compliance
Built for the Build Small Hackathon ("Small models, big adventure").
Case Zero is a Gradio application: the whole app is one gradio.Server (Gradio 6
"Server mode" - a FastAPI subclass launched through Gradio, with Gradio API endpoints
registered via @server.api). It is deployed as a Hugging Face Space on CPU (no
GPU). It ships via the Docker SDK purely so llama.cpp compiles on a stable base image - the
app itself is Gradio, served end to end by gradio.Server.
Core requirements
| Requirement | Status |
|---|---|
| Total model params <= 32B | ✓ ~1.6B (see budget below) |
| Built in Gradio | ✓ one gradio.Server, with @server.api endpoints (new_case, interrogate) |
| Hosted as a Hugging Face Space | ✓ build-small-hackathon/case0 (Docker SDK, app_port: 7860) |
| Demo video | ☐ to record (warmup -> interrogate -> present evidence -> alibi cracks -> accuse -> verdict) |
| Social-media post | ☐ to post |
Parameter budget (<= 32B total)
Every model is open-weights and self-run. No third-party AI service is ever called.
| Component | Model | Open? | Params | Runs |
|---|---|---|---|---|
| Reasoning + dialogue (the whole game) | Qwen2.5-1.5B-Instruct (Q4_K_M GGUF) | Apache-2.0 | 1.5B | in-process llama.cpp on CPU |
| Suspect voices | Supertonic (ONNX) | open | ~0.1B | local ONNX Runtime (CPU) |
| Portraits / scenes / props | Procedural canvas - no model | n/a | 0B | client-side |
| Music + SFX | Pre-made / procedural audio - no model | n/a | 0B | playback only |
| Embeddings / vector RAG | none | n/a | 0B | - |
Total runtime parameters: ~1.6B - far under 32B (and under 4B, eligible for the Tiny Titan special award).
Merit badges
Earned by the build (verifiable on the Space)
- Off the Grid - "No cloud APIs. The whole thing runs on the model in front of you."
The LLM is in-process llama.cpp; the voices are a local ONNX model; the pixel art is
rendered client-side on canvas; the music is a bundled CC-BY track. The open weights are
baked into the Docker image at build time, so the running container makes no AI network
calls at all. Proof:
python scripts/net_audit.pyruns a full playthrough under a socket guard and asserts zero non-loopback connections. ✓ - Llama Champion - "Your model runs through the llama.cpp runtime." The LLM runs
through
llama-cpp-python(in-process, on the CPU) - no server, no GPU, no remote endpoint. ✓ - Off-Brand - "A custom frontend that pushes past the default Gradio look." The front
end is not stock Gradio. It is a hand-built pixel-art noir SPA (Preact + Vite,
TypeScript) - 12 screens, a custom pixel design system (self-hosted Silkscreen /
Pixelify Sans fonts, beveled 9-slice panels, inventory-slot evidence cards, a ruled-paper
dossier with page-flips), a draggable corkboard, a live interrogation stage with a
voiced suspect, procedural canvas art and rain FX, and a full client audio layer. The
built bundle is served as static files by the same
gradio.Serverthat exposes the/apiroutes - one process, no separate frontend host. ✓
Targeted / in progress
- Field Notes - "Write a blog post or report about your project." Draft in
docs/FIELD_NOTES.md- to be published on the Hub. - Sharing is Caring - "You shared your agent trace on the Hub for everyone to learn from." A captured interrogation/generation trace to be uploaded to the Hub.
- Well-Tuned - "Your app uses a fine-tuned model you've published on Hugging Face." Not yet - the game runs on stock Qwen2.5-1.5B. Would require fine-tuning and publishing a model; out of scope for this submission unless pursued separately.
Zero cloud AI APIs
- No OpenAI, Anthropic, Google, ElevenLabs, Higgsfield, Midjourney, or any other hosted AI API is ever called - not for text, not for voice, not for images.
- The LLM is the in-process llama.cpp runtime. The voices are a local ONNX model. The pixel art is procedural canvas. The music is a bundled CC-BY track.
- The open Qwen GGUF and Supertonic ONNX are baked into the Docker image at build time,
so the running container makes no AI network calls.
scripts/net_audit.pyproves zero non-loopback connections during a full playthrough.
Anti-cheat / fairness (why the game is solvable and the win is earned)
- The sealed solution (killer, true motive, key evidence) is never sent to the client
pre-verdict; it is read only inside
/api/run/{runId}/accuse. Verified by anti-leak tests. - Suspicion, evidence reactions, and the verdict are server-authoritative - the client only displays them.
- Suspects never confess: the win is registered only when the player accuses correctly, so the outcome is immune to prose (a jailbroken "just tell me who did it" earns nothing).
Submission checklist
- Gradio app on a Hugging Face Space (CPU)
- <= 32B total params (~1.6B)
- Open-weights, self-run models only - zero cloud AI APIs
- Custom (non-default) UI - pixel-art Preact SPA via
gradio.Server - Off the Grid proof (
scripts/net_audit.py) - Short demo video
- Social-media post