--- title: The Apprentice emoji: 🌲 colorFrom: indigo colorTo: yellow sdk: docker app_port: 7860 suggested_hardware: cpu-basic pinned: false license: mit short_description: Five oracles, five trials — branching pixel-art game. tags: - track:wood - sponsor:modal - achievement:offgrid - achievement:welltuned - achievement:offbrand - achievement:llama - achievement:sharing - achievement:fieldnotes # Track - thousand-token-wood # Badge claims - well-tuned - off-brand - field-notes - sharing-is-caring - llama-champion - tiny-titan # Descriptive - branching-narrative - game - pixel-art - gradio - vllm - lora - qwen - bilingual models: - Qwen/Qwen2.5-14B-Instruct - Qwen/Qwen2.5-1.5B-Instruct - AndrewRqy/oracles-wizard-14b-lora - AndrewRqy/oracles-wizard-1.5b-lora --- # Acknowledgement This app is built by AndrewRqy. # The Apprentice — Build Small Hackathon > A pixel-art branching fairy-tale. You inscribe five short oracles before the journey begins; an apprentice has to make every one of them save his life across five trials in a tree that converges on one of five distinct endings. ## The idea You play the mentor. You write five short oracles into a parchment — any words at all: advice, gibberish, emoji, names, typos, whatever. After that you don't get to explain anything. Your apprentice walks five trials, and at each one he draws ONE oracle at random. Whatever it says, a fine-tuned Qwen2.5-14B has to take it seriously enough to save his life — three humor modes (wild imagination / accidental trip / last-minute revelation), a 15-node branching tree that converges on one of five distinct endings, and six themes × two languages (English + įŽ€äŊ“中文). The core joke: the player can write nonsense, but the world has to take it seriously. ## The tech - **Frontend**: a single-file Gradio Blocks app (~5000 lines), wrapped in a custom Docker image. ~2000 lines of bespoke CSS make sure nothing on the page looks like default Gradio — Press Start 2P + VT323 fonts, NES-style sharp corners, hand-laid pixel-art panels. - **Backend**: Qwen2.5-14B served via vLLM on a Modal-hosted L40S, with a custom-trained humor LoRA (rank 16, 23k examples, ~6.5h on H100, ~$22 of compute). The Gradio app talks to it via the OpenAI SDK. - **Tiny Titan variant**: same 23k corpus trained into a Qwen2.5-1.5B LoRA — eligible for the ≤4B prize. - **Llama Champion path**: the merged 14B exported to GGUF (Q4_K_M, 8.4 GB) and served via `llama-cpp-python`'s OpenAI-compatible server. `./run.sh --local-llama` swaps cloud for fully-local inference. - **All art generated locally** via Klein-4B on a Modal H100, then chroma-keyed offline. ~105 pixel-art sprites. No FLUX, no commercial generators. ## Quick links - **Track**: Thousand Token Wood - **Stack**: Docker + Gradio + Modal-hosted vLLM + Qwen2.5-14B + custom humor LoRA - **Languages**: English + įŽ€äŊ“中文 - **Demo video**: https://youtu.be/Ica9BgX5ZDk - **Social post**: https://x.com/AndrewRenqy/status/2066549274930741648 - **Field notes (blog post)**: https://huggingface.co/blog/AndrewRqy/apprentice-blog-url - **Field notes (repo)**: [`docs/FIELD_NOTES_apprentice.md`](../docs/FIELD_NOTES_apprentice.md) > **Recommended for the best experience: run it locally in full mode.** The HF Space defaults to a stripped-down lean visual variant because of the bandwidth + cold-start constraints below. To see the parallax banner, parchment textures, scene landscapes, mentor/apprentice figures, animated trial scenes, and all the polish the way they were designed, clone the repo, drop the three Modal secrets into `.env.local`, and run `./run.sh --full`. See [Running it](#running-it) below for the full setup. > > **Note on loading time**: this Space ships ~100 pixel-art sprites + theme backdrops. HF Space's free CPU tier has slow egress bandwidth, so first paint of a fresh container can take a minute or two; subsequent page transitions are faster as the browser caches assets. The front-page dropdown lets you flip between **Lean** (small payload, fast loading, default on the Space) and **Full** (parallax banner, scene landscapes, all decorative PNGs — recommended only on a fast connection or once the Space is warm). > > **Note on LLM cold start**: the Modal-hosted LLM container scales to zero when idle to avoid 24/7 billing during the review period. The first LLM call after the container has been idle (~20 min) pays a **~60-120s cold start** while vLLM loads the 14B weights + the LoRA adapter onto an L40S GPU. To hide this from the player, the app fires a background warmup ping to the Modal endpoint at startup, so by the time you've finished inscribing five oracles (~2-5 min of typing), the container should already be warm. If you click "Let the journey begin" immediately on a cold Space, expect the first trial to wait an extra minute. Every subsequent trial in the same session is instant. ## What's inside - **Frontend** — Single-file Gradio app with a hand-authored pixel-art aesthetic. Press Start 2P + VT323 fonts, NES-style sharp corners, custom theme suppressing all default Gradio chrome. - **Backend** — Qwen2.5-14B + custom humor LoRA (`AndrewRqy/oracles-wizard-14b-lora`) served via vLLM on Modal. Frontend talks to it through the OpenAI SDK. - **Tiny Titan path** — Same 23k humor corpus trained into a Qwen2.5-1.5B LoRA (`AndrewRqy/oracles-wizard-1.5b-lora`). Eligible for the ≤4B prize. - **Branching narrative** — Hand-authored 15-node story tree with 5 endings. Each fork at trials 2–4 is decided by an LLM call seeded with one of the player's oracles, so the path the apprentice walks is shaped by what was inscribed. - **6 themes × 2 languages** — Fantasy, Space-Cowboy, Galactic-Light, Black-Land, Mistgate, Quiet-Years. Theme-neutral story nodes + per-theme vocabulary expansion at runtime. - **All art generated locally** — ~105 pixel-art sprites via Klein-4B on a Modal H100, chroma-keyed offline. No FLUX, no commercial generators. ## How to play 1. **Inscribe** — pick a language, theme, visual mode, and narration length. Then write five short oracles. Any words; gibberish counts, emoji counts. 2. **Send-off** — the mentor seals the parchments. The apprentice leaves. 3. **Five trials** — at each obstacle, the apprentice draws ONE oracle. The model takes the obstacle + oracle and writes a ~200-word resolution in one of three humor modes (wild imagination / accidental trip / last-minute revelation). 4. **Boss** — trial 5 is the world's finale (dragon, warlord-king, etc.). Different paths through the tree lead to different bosses. 5. **Ending** — one of 5 distinct endings plays, each with a hand-authored framing (why the boss behaved as it did + what the apprentice carried home), expanded by the LLM into a 3-paragraph epilogue. 6. **Summary** — the story tree shows the path you walked lit gold; the four endings you didn't reach blur behind "???" for replay. ## Badge claims | Badge | Why we claim it | |---|---| | đŸŽ¯ **Well-Tuned** | Qwen2.5-14B + a hand-distilled 23k-example humor LoRA (rank 16, ~6.5h on H100). Visibly steers all three humor modes; details in field notes. | | 🎨 **Off-Brand** | ~2000 lines of bespoke CSS, Press Start 2P + VT323 fonts, hand-painted pixel-art sprites, custom story-tree visualization, custom ending banner. No stock Gradio chrome reaches the page. | | 📓 **Field Notes** | Blog post: https://huggingface.co/blog/AndrewRqy/apprentice-blog-url ⋅ Repo mirror: [`docs/FIELD_NOTES_apprentice.md`](../docs/FIELD_NOTES_apprentice.md) — a build diary covering what we designed and what broke. | | 📡 **Sharing-is-Caring** | [`traces/sample/`](traces/sample/) — JSONL captures of every LLM call from a real playthrough (prompts, responses, latency, token usage, both requested and returned model id). LLM-call tracing is default-on; opt out with `ORACLES_TRACE_DISABLE=1`. | | đŸĻ™ **Llama Champion** | The LoRA-merged Qwen2.5-14B is exported to GGUF (Q4_K_M, ~8.4 GB) via the conversion job in [`modal_backend/modal_gguf_convert.py`](../modal_backend/modal_gguf_convert.py) and runs locally through `llama-cpp-python`'s OpenAI-compatible server. Launch with `./run.sh --local-llama` — no Modal call required. | | ⚡ **Tiny Titan** | Same 23k corpus trained into a Qwen2.5-1.5B LoRA (~$5.50, ~1.5h on H100). Eligible for the ≤4B prize. | ## Running it Three environment variables go in HF Space → **Settings → Variables and secrets** (or `.env.local` for local runs): ``` MODAL_URL = https://---serve.modal.run MODAL_KEY = wk-â€Ļ (Modal proxy auth key) MODAL_SECRET = ws-â€Ļ (Modal proxy auth secret) ``` Locally: ```bash ./run.sh # lean mode, default ./run.sh --full # all visual assets enabled (recommended on fast connections) ``` If `MODAL_URL` is unset OR `ORACLES_FORCE_MOCK=1`, the app runs in **mock mode** — the UI still works, but narrations are hand-written placeholders. ## Repo layout ``` oracles_app/ ├── app.py # main Gradio file ├── Dockerfile # HF Space Docker SDK entry ├── requirements.txt ├── oracles/ # state, LLM client, story graph, themes, i18n ├── prompts/ # LLM prompt templates └── assets/sprites/ # ~105 chroma-keyed pixel-art PNGs ``` Dev-only dirs (`modal_backend/`, `scripts/`, `training/`, `tests/`, `lora-out/`) live on local disk but are `.gitignore`d from the Space upload. ## Credits - Base model — Qwen2.5-14B-Instruct + Qwen2.5-1.5B-Instruct (Alibaba) - Distillation teacher — Claude Sonnet 4.5 (Anthropic) via OpenRouter - Sprite generator — Klein-4B (Anthropic) on Modal H100 - Pixel-art fonts — Press Start 2P + VT323 (Google Fonts) Built for the **Build Small Hackathon — Thousand Token Wood track** (2026-06-15).