A newer version of the Gradio SDK is available: 6.17.3
title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
- lastmass/Qwen3.5-Medical-GSPO
🏮 Case Lantern
Short Demo Video:
Case Lantern is a fictional medical mystery game for the Build Small Hackathon. Players investigate a short Chinese case, request clues, avoid red herrings, and submit a diagnosis within six turns.
The experience uses lastmass/Qwen3.5-Medical-GSPO,
a small Chinese medical reasoning model with roughly 4.66B parameters, as the
game master and scorer. Inference runs locally via llama.cpp (GGUF Q4_K_M).
Track & Merit Badges
| Item | Detail |
|---|---|
| Track | An Adventure in Thousand Token Wood |
| AI role | Load-bearing game master, clue writer, and scoring judge |
| Constraint | Small model under 32B parameters |
| UI | Gradio Space with custom dark frontend |
| Badge | Status |
|---|---|
| 🏕️ Off the Grid (LOCAL-FIRST) | ✅ Model runs locally in the Space |
| 🎸 Well-Tuned (FINE-TUNED) | ✅ Uses fine-tuned model published on HF |
| 🦙 Llama Champion | ✅ Runs via llama.cpp runtime |
| 🎨 Off-Brand (CUSTOM UI) | ✅ Dark glassmorphism theme, custom CSS |
Safety framing
This is not a diagnosis or treatment tool. It only uses fictional cases for learning-oriented gameplay. Users are explicitly asked not to provide personal health information.
Deployment notes
The app is designed for free CPU Spaces on Hugging Face. It does not require a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first launch and cached.
If you deploy on ZeroGPU, keep the prebuilt CPU llama-cpp-python wheel.
The requirements.txt file uses the CPU wheel index
(llama-cpp-python/whl/cpu) plus --only-binary=llama-cpp-python, and pins to
the latest available prebuilt wheel in that index. This keeps the Space from
trying to compile llama.cpp from source. Do not use the CUDA wheel URL
(llama-cpp-python/whl/cu124) unless the Space image also provides CUDA runtime
libraries such as libcudart.so.12; otherwise model loading can fail when the
first button click triggers inference.
- Set
DEMO_MODE=auto(default) to allow a graceful scripted fallback if the model cannot load. - Set
DEMO_MODE=trueto skip model loading entirely (instant UI-only demo). - Set
DEMO_MODE=offif you want model-loading failures to surface immediately.
Local run
pip install -r requirements.txt
DEMO_MODE=true python app.py
On Windows PowerShell:
$env:DEMO_MODE="true"
python app.py