the-apprentice

Sleeping

App Files Files Community

the-apprentice / README.md

laoliu5280

update readme for compatibility (#1)

0a4cd0e 13 days ago

preview code

Raw

History Blame Contribute Delete

9.92 kB

	---
	title: The Apprentice
	emoji: 🌲
	colorFrom: indigo
	colorTo: yellow
	sdk: docker
	app_port: 7860
	suggested_hardware: cpu-basic
	pinned: false
	license: mit
	short_description: Five oracles, five trials — branching pixel-art game.
	tags:
	- track:wood
	- sponsor:modal
	- achievement:offgrid
	- achievement:welltuned
	- achievement:offbrand
	- achievement:llama
	- achievement:sharing
	- achievement:fieldnotes
	# Track
	- thousand-token-wood
	# Badge claims
	- well-tuned
	- off-brand
	- field-notes
	- sharing-is-caring
	- llama-champion
	- tiny-titan
	# Descriptive
	- branching-narrative
	- game
	- pixel-art
	- gradio
	- vllm
	- lora
	- qwen
	- bilingual
	models:
	- Qwen/Qwen2.5-14B-Instruct
	- Qwen/Qwen2.5-1.5B-Instruct
	- AndrewRqy/oracles-wizard-14b-lora
	- AndrewRqy/oracles-wizard-1.5b-lora
	---
	# Acknowledgement

	This app is built by AndrewRqy.

	# The Apprentice — Build Small Hackathon

	> A pixel-art branching fairy-tale. You inscribe five short oracles before the journey begins; an apprentice has to make every one of them save his life across five trials in a tree that converges on one of five distinct endings.

	## The idea

	You play the mentor. You write five short oracles into a parchment — any words at all: advice, gibberish, emoji, names, typos, whatever. After that you don't get to explain anything. Your apprentice walks five trials, and at each one he draws ONE oracle at random. Whatever it says, a fine-tuned Qwen2.5-14B has to take it seriously enough to save his life — three humor modes (wild imagination / accidental trip / last-minute revelation), a 15-node branching tree that converges on one of five distinct endings, and six themes × two languages (English + 简体中文). The core joke: the player can write nonsense, but the world has to take it seriously.

	## The tech

	- Frontend: a single-file Gradio Blocks app (~5000 lines), wrapped in a custom Docker image. ~2000 lines of bespoke CSS make sure nothing on the page looks like default Gradio — Press Start 2P + VT323 fonts, NES-style sharp corners, hand-laid pixel-art panels.
	- Backend: Qwen2.5-14B served via vLLM on a Modal-hosted L40S, with a custom-trained humor LoRA (rank 16, 23k examples, ~6.5h on H100, ~$22 of compute). The Gradio app talks to it via the OpenAI SDK.
	- Tiny Titan variant: same 23k corpus trained into a Qwen2.5-1.5B LoRA — eligible for the ≤4B prize.
	- Llama Champion path: the merged 14B exported to GGUF (Q4_K_M, 8.4 GB) and served via `llama-cpp-python`'s OpenAI-compatible server. `./run.sh --local-llama` swaps cloud for fully-local inference.
	- All art generated locally via Klein-4B on a Modal H100, then chroma-keyed offline. ~105 pixel-art sprites. No FLUX, no commercial generators.

	## Quick links

	- Track: Thousand Token Wood
	- Stack: Docker + Gradio + Modal-hosted vLLM + Qwen2.5-14B + custom humor LoRA
	- Languages: English + 简体中文
	- Demo video: https://youtu.be/Ica9BgX5ZDk
	- Social post: https://x.com/AndrewRenqy/status/2066549274930741648
	- Field notes (blog post): https://huggingface.co/blog/AndrewRqy/apprentice-blog-url
	- Field notes (repo): [`docs/FIELD_NOTES_apprentice.md`](../docs/FIELD_NOTES_apprentice.md)

	> Recommended for the best experience: run it locally in full mode. The HF Space defaults to a stripped-down lean visual variant because of the bandwidth + cold-start constraints below. To see the parallax banner, parchment textures, scene landscapes, mentor/apprentice figures, animated trial scenes, and all the polish the way they were designed, clone the repo, drop the three Modal secrets into `.env.local`, and run `./run.sh --full`. See [Running it](#running-it) below for the full setup.
	>
	> Note on loading time: this Space ships ~100 pixel-art sprites + theme backdrops. HF Space's free CPU tier has slow egress bandwidth, so first paint of a fresh container can take a minute or two; subsequent page transitions are faster as the browser caches assets. The front-page dropdown lets you flip between Lean (small payload, fast loading, default on the Space) and Full (parallax banner, scene landscapes, all decorative PNGs — recommended only on a fast connection or once the Space is warm).
	>
	> Note on LLM cold start: the Modal-hosted LLM container scales to zero when idle to avoid 24/7 billing during the review period. The first LLM call after the container has been idle (~20 min) pays a ~60-120s cold start while vLLM loads the 14B weights + the LoRA adapter onto an L40S GPU. To hide this from the player, the app fires a background warmup ping to the Modal endpoint at startup, so by the time you've finished inscribing five oracles (~2-5 min of typing), the container should already be warm. If you click "Let the journey begin" immediately on a cold Space, expect the first trial to wait an extra minute. Every subsequent trial in the same session is instant.

	## What's inside

	- Frontend — Single-file Gradio app with a hand-authored pixel-art aesthetic. Press Start 2P + VT323 fonts, NES-style sharp corners, custom theme suppressing all default Gradio chrome.
	- Backend — Qwen2.5-14B + custom humor LoRA (`AndrewRqy/oracles-wizard-14b-lora`) served via vLLM on Modal. Frontend talks to it through the OpenAI SDK.
	- Tiny Titan path — Same 23k humor corpus trained into a Qwen2.5-1.5B LoRA (`AndrewRqy/oracles-wizard-1.5b-lora`). Eligible for the ≤4B prize.
	- Branching narrative — Hand-authored 15-node story tree with 5 endings. Each fork at trials 2–4 is decided by an LLM call seeded with one of the player's oracles, so the path the apprentice walks is shaped by what was inscribed.
	- 6 themes × 2 languages — Fantasy, Space-Cowboy, Galactic-Light, Black-Land, Mistgate, Quiet-Years. Theme-neutral story nodes + per-theme vocabulary expansion at runtime.
	- All art generated locally — ~105 pixel-art sprites via Klein-4B on a Modal H100, chroma-keyed offline. No FLUX, no commercial generators.

	## How to play

	1. Inscribe — pick a language, theme, visual mode, and narration length. Then write five short oracles. Any words; gibberish counts, emoji counts.
	2. Send-off — the mentor seals the parchments. The apprentice leaves.
	3. Five trials — at each obstacle, the apprentice draws ONE oracle. The model takes the obstacle + oracle and writes a ~200-word resolution in one of three humor modes (wild imagination / accidental trip / last-minute revelation).
	4. Boss — trial 5 is the world's finale (dragon, warlord-king, etc.). Different paths through the tree lead to different bosses.
	5. Ending — one of 5 distinct endings plays, each with a hand-authored framing (why the boss behaved as it did + what the apprentice carried home), expanded by the LLM into a 3-paragraph epilogue.
	6. Summary — the story tree shows the path you walked lit gold; the four endings you didn't reach blur behind "???" for replay.

	## Badge claims

	\| Badge \| Why we claim it \|
	\|---\|---\|
	\| 🎯 Well-Tuned \| Qwen2.5-14B + a hand-distilled 23k-example humor LoRA (rank 16, ~6.5h on H100). Visibly steers all three humor modes; details in field notes. \|
	\| 🎨 Off-Brand \| ~2000 lines of bespoke CSS, Press Start 2P + VT323 fonts, hand-painted pixel-art sprites, custom story-tree visualization, custom ending banner. No stock Gradio chrome reaches the page. \|
	\| 📓 Field Notes \| Blog post: https://huggingface.co/blog/AndrewRqy/apprentice-blog-url ⋅ Repo mirror: [`docs/FIELD_NOTES_apprentice.md`](../docs/FIELD_NOTES_apprentice.md) — a build diary covering what we designed and what broke. \|
	\| 📡 Sharing-is-Caring \| [`traces/sample/`](traces/sample/) — JSONL captures of every LLM call from a real playthrough (prompts, responses, latency, token usage, both requested and returned model id). LLM-call tracing is default-on; opt out with `ORACLES_TRACE_DISABLE=1`. \|
	\| 🦙 Llama Champion \| The LoRA-merged Qwen2.5-14B is exported to GGUF (Q4_K_M, ~8.4 GB) via the conversion job in [`modal_backend/modal_gguf_convert.py`](../modal_backend/modal_gguf_convert.py) and runs locally through `llama-cpp-python`'s OpenAI-compatible server. Launch with `./run.sh --local-llama` — no Modal call required. \|
	\| ⚡ Tiny Titan \| Same 23k corpus trained into a Qwen2.5-1.5B LoRA (~$5.50, ~1.5h on H100). Eligible for the ≤4B prize. \|

	## Running it

	Three environment variables go in HF Space → Settings → Variables and secrets (or `.env.local` for local runs):

	```
	MODAL_URL = https://<workspace>--<app>-serve.modal.run
	MODAL_KEY = wk-… (Modal proxy auth key)
	MODAL_SECRET = ws-… (Modal proxy auth secret)
	```

	Locally:

	```bash
	./run.sh # lean mode, default
	./run.sh --full # all visual assets enabled (recommended on fast connections)
	```

	If `MODAL_URL` is unset OR `ORACLES_FORCE_MOCK=1`, the app runs in mock mode — the UI still works, but narrations are hand-written placeholders.

	## Repo layout

	```
	oracles_app/
	├── app.py # main Gradio file
	├── Dockerfile # HF Space Docker SDK entry
	├── requirements.txt
	├── oracles/ # state, LLM client, story graph, themes, i18n
	├── prompts/ # LLM prompt templates
	└── assets/sprites/ # ~105 chroma-keyed pixel-art PNGs
	```

	Dev-only dirs (`modal_backend/`, `scripts/`, `training/`, `tests/`, `lora-out/`) live on local disk but are `.gitignore`d from the Space upload.

	## Credits

	- Base model — Qwen2.5-14B-Instruct + Qwen2.5-1.5B-Instruct (Alibaba)
	- Distillation teacher — Claude Sonnet 4.5 (Anthropic) via OpenRouter
	- Sprite generator — Klein-4B (Anthropic) on Modal H100
	- Pixel-art fonts — Press Start 2P + VT323 (Google Fonts)

	Built for the Build Small Hackathon — Thousand Token Wood track (2026-06-15).