case0 / COMPLIANCE.md
HusseinEid's picture
Case Zero - initial public release (fully local: Qwen2.5-1.5B via llama.cpp + Supertonic, custom pixel-noir SPA via gradio.Server)
414dc55

Case Zero - Hackathon Compliance

Built for the Build Small Hackathon ("Small models, big adventure").

Case Zero is a Gradio application: the whole app is one gradio.Server (Gradio 6 "Server mode" - a FastAPI subclass launched through Gradio, with Gradio API endpoints registered via @server.api). It is deployed as a Hugging Face Space on CPU (no GPU). It ships via the Docker SDK purely so llama.cpp compiles on a stable base image - the app itself is Gradio, served end to end by gradio.Server.

Core requirements

Requirement Status
Total model params <= 32B ✓ ~1.6B (see budget below)
Built in Gradio ✓ one gradio.Server, with @server.api endpoints (new_case, interrogate)
Hosted as a Hugging Face Space build-small-hackathon/case0 (Docker SDK, app_port: 7860)
Demo video ☐ to record (warmup -> interrogate -> present evidence -> alibi cracks -> accuse -> verdict)
Social-media post ☐ to post

Parameter budget (<= 32B total)

Every model is open-weights and self-run. No third-party AI service is ever called.

Component Model Open? Params Runs
Reasoning + dialogue (the whole game) Qwen2.5-1.5B-Instruct (Q4_K_M GGUF) Apache-2.0 1.5B in-process llama.cpp on CPU
Suspect voices Supertonic (ONNX) open ~0.1B local ONNX Runtime (CPU)
Portraits / scenes / props Procedural canvas - no model n/a 0B client-side
Music + SFX Pre-made / procedural audio - no model n/a 0B playback only
Embeddings / vector RAG none n/a 0B -

Total runtime parameters: ~1.6B - far under 32B (and under 4B, eligible for the Tiny Titan special award).

Merit badges

Earned by the build (verifiable on the Space)

  • Off the Grid - "No cloud APIs. The whole thing runs on the model in front of you." The LLM is in-process llama.cpp; the voices are a local ONNX model; the pixel art is rendered client-side on canvas; the music is a bundled CC-BY track. The open weights are baked into the Docker image at build time, so the running container makes no AI network calls at all. Proof: python scripts/net_audit.py runs a full playthrough under a socket guard and asserts zero non-loopback connections. ✓
  • Llama Champion - "Your model runs through the llama.cpp runtime." The LLM runs through llama-cpp-python (in-process, on the CPU) - no server, no GPU, no remote endpoint. ✓
  • Off-Brand - "A custom frontend that pushes past the default Gradio look." The front end is not stock Gradio. It is a hand-built pixel-art noir SPA (Preact + Vite, TypeScript) - 12 screens, a custom pixel design system (self-hosted Silkscreen / Pixelify Sans fonts, beveled 9-slice panels, inventory-slot evidence cards, a ruled-paper dossier with page-flips), a draggable corkboard, a live interrogation stage with a voiced suspect, procedural canvas art and rain FX, and a full client audio layer. The built bundle is served as static files by the same gradio.Server that exposes the /api routes - one process, no separate frontend host. ✓

Targeted / in progress

  • Field Notes - "Write a blog post or report about your project." Draft in docs/FIELD_NOTES.md - to be published on the Hub.
  • Sharing is Caring - "You shared your agent trace on the Hub for everyone to learn from." A captured interrogation/generation trace to be uploaded to the Hub.
  • Well-Tuned - "Your app uses a fine-tuned model you've published on Hugging Face." Not yet - the game runs on stock Qwen2.5-1.5B. Would require fine-tuning and publishing a model; out of scope for this submission unless pursued separately.

Zero cloud AI APIs

  • No OpenAI, Anthropic, Google, ElevenLabs, Higgsfield, Midjourney, or any other hosted AI API is ever called - not for text, not for voice, not for images.
  • The LLM is the in-process llama.cpp runtime. The voices are a local ONNX model. The pixel art is procedural canvas. The music is a bundled CC-BY track.
  • The open Qwen GGUF and Supertonic ONNX are baked into the Docker image at build time, so the running container makes no AI network calls. scripts/net_audit.py proves zero non-loopback connections during a full playthrough.

Anti-cheat / fairness (why the game is solvable and the win is earned)

  • The sealed solution (killer, true motive, key evidence) is never sent to the client pre-verdict; it is read only inside /api/run/{runId}/accuse. Verified by anti-leak tests.
  • Suspicion, evidence reactions, and the verdict are server-authoritative - the client only displays them.
  • Suspects never confess: the win is registered only when the player accuses correctly, so the outcome is immune to prose (a jailbroken "just tell me who did it" earns nothing).

Submission checklist

  • Gradio app on a Hugging Face Space (CPU)
  • <= 32B total params (~1.6B)
  • Open-weights, self-run models only - zero cloud AI APIs
  • Custom (non-default) UI - pixel-art Preact SPA via gradio.Server
  • Off the Grid proof (scripts/net_audit.py)
  • Short demo video
  • Social-media post