case0 / COMPLIANCE.md
HusseinEid's picture
Case Zero - initial public release (fully local: Qwen2.5-1.5B via llama.cpp + Supertonic, custom pixel-noir SPA via gradio.Server)
414dc55
# Case Zero - Hackathon Compliance
Built for the **Build Small Hackathon** ("Small models, big adventure").
Case Zero is a **Gradio application**: the whole app is one `gradio.Server` (Gradio 6
"Server mode" - a FastAPI subclass launched through Gradio, with Gradio API endpoints
registered via `@server.api`). It is deployed as a **Hugging Face Space** on **CPU** (no
GPU). It ships via the Docker SDK purely so llama.cpp compiles on a stable base image - the
app itself is Gradio, served end to end by `gradio.Server`.
## Core requirements
| Requirement | Status |
|---|---|
| Total model params <= 32B | βœ“ ~1.6B (see budget below) |
| Built in Gradio | βœ“ one `gradio.Server`, with `@server.api` endpoints (`new_case`, `interrogate`) |
| Hosted as a Hugging Face Space | βœ“ `build-small-hackathon/case0` (Docker SDK, `app_port: 7860`) |
| Demo video | ☐ to record (warmup -> interrogate -> present evidence -> alibi cracks -> accuse -> verdict) |
| Social-media post | ☐ to post |
## Parameter budget (<= 32B total)
Every model is open-weights and self-run. **No third-party AI service is ever called.**
| Component | Model | Open? | Params | Runs |
|---|---|---|---|---|
| Reasoning + dialogue (the whole game) | Qwen2.5-1.5B-Instruct (Q4_K_M GGUF) | Apache-2.0 | **1.5B** | in-process llama.cpp on CPU |
| Suspect voices | Supertonic (ONNX) | open | ~0.1B | local ONNX Runtime (CPU) |
| Portraits / scenes / props | Procedural canvas - no model | n/a | 0B | client-side |
| Music + SFX | Pre-made / procedural audio - no model | n/a | 0B | playback only |
| Embeddings / vector RAG | none | n/a | 0B | - |
**Total runtime parameters: ~1.6B** - far under 32B (and under 4B, eligible for the
**Tiny Titan** special award).
## Merit badges
### Earned by the build (verifiable on the Space)
- **Off the Grid** - *"No cloud APIs. The whole thing runs on the model in front of you."*
The LLM is in-process llama.cpp; the voices are a local ONNX model; the pixel art is
rendered client-side on canvas; the music is a bundled CC-BY track. The open weights are
baked into the Docker image at build time, so the running container makes **no AI network
calls at all**. Proof: `python scripts/net_audit.py` runs a full playthrough under a
socket guard and asserts **zero non-loopback connections**. βœ“
- **Llama Champion** - *"Your model runs through the llama.cpp runtime."* The LLM runs
through `llama-cpp-python` (in-process, on the CPU) - no server, no GPU, no remote
endpoint. βœ“
- **Off-Brand** - *"A custom frontend that pushes past the default Gradio look."* The front
end is **not** stock Gradio. It is a hand-built **pixel-art noir SPA (Preact + Vite,
TypeScript)** - 12 screens, a custom pixel design system (self-hosted Silkscreen /
Pixelify Sans fonts, beveled 9-slice panels, inventory-slot evidence cards, a ruled-paper
dossier with page-flips), a draggable corkboard, a live interrogation stage with a
voiced suspect, procedural canvas art and rain FX, and a full client audio layer. The
built bundle is served as static files by the same `gradio.Server` that exposes the
`/api` routes - one process, no separate frontend host. βœ“
### Targeted / in progress
- **Field Notes** - *"Write a blog post or report about your project."* Draft in
[`docs/FIELD_NOTES.md`](docs/FIELD_NOTES.md) - to be published on the Hub.
- **Sharing is Caring** - *"You shared your agent trace on the Hub for everyone to learn
from."* A captured interrogation/generation trace to be uploaded to the Hub.
- **Well-Tuned** - *"Your app uses a fine-tuned model you've published on Hugging Face."*
Not yet - the game runs on stock Qwen2.5-1.5B. Would require fine-tuning and publishing a
model; out of scope for this submission unless pursued separately.
## Zero cloud AI APIs
- **No OpenAI, Anthropic, Google, ElevenLabs, Higgsfield, Midjourney, or any other hosted
AI API is ever called** - not for text, not for voice, not for images.
- The LLM is the in-process llama.cpp runtime. The voices are a local ONNX model. The pixel
art is procedural canvas. The music is a bundled CC-BY track.
- The open Qwen GGUF and Supertonic ONNX are **baked into the Docker image at build time**,
so the running container makes no AI network calls. `scripts/net_audit.py` proves zero
non-loopback connections during a full playthrough.
## Anti-cheat / fairness (why the game is solvable and the win is earned)
- The sealed solution (killer, true motive, key evidence) is **never sent to the client**
pre-verdict; it is read only inside `/api/run/{runId}/accuse`. Verified by anti-leak tests.
- Suspicion, evidence reactions, and the verdict are **server-authoritative** - the client
only displays them.
- Suspects **never confess**: the win is registered only when the player accuses correctly,
so the outcome is immune to prose (a jailbroken "just tell me who did it" earns nothing).
## Submission checklist
- [x] Gradio app on a Hugging Face Space (CPU)
- [x] <= 32B total params (~1.6B)
- [x] Open-weights, self-run models only - zero cloud AI APIs
- [x] Custom (non-default) UI - pixel-art Preact SPA via `gradio.Server`
- [x] Off the Grid proof (`scripts/net_audit.py`)
- [ ] Short demo video
- [ ] Social-media post