File size: 2,753 Bytes
adff921 dceceb5 adff921 a755fdb adff921 a755fdb adff921 61f49a8 01edaff 27935bf adff921 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | ---
title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
- lastmass/Qwen3.5-Medical-GSPO
---
# 🏮 Case Lantern
Short Demo Video:
https://youtu.be/Bf3t5Cq6XuA
Case Lantern is a fictional medical mystery game for the
[Build Small Hackathon](https://huggingface.co/build-small-hackathon).
Players investigate a short Chinese case, request clues, avoid red herrings, and
submit a diagnosis within six turns.
The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO),
a small Chinese medical reasoning model with roughly 4.66B parameters, as the
game master and scorer. Inference runs locally via **llama.cpp** (GGUF Q4_K_M).
## Track & Merit Badges
| Item | Detail |
|------|--------|
| Track | An Adventure in Thousand Token Wood |
| AI role | Load-bearing game master, clue writer, and scoring judge |
| Constraint | Small model under 32B parameters |
| UI | Gradio Space with custom dark frontend |
| Badge | Status |
|-------|--------|
| 🏕️ Off the Grid (LOCAL-FIRST) | ✅ Model runs locally in the Space |
| 🎸 Well-Tuned (FINE-TUNED) | ✅ Uses fine-tuned model published on HF |
| 🦙 Llama Champion | ✅ Runs via llama.cpp runtime |
| 🎨 Off-Brand (CUSTOM UI) | ✅ Dark glassmorphism theme, custom CSS |
## Safety framing
This is not a diagnosis or treatment tool. It only uses fictional cases for
learning-oriented gameplay. Users are explicitly asked not to provide personal
health information.
## Deployment notes
The app is designed for **free CPU Spaces** on Hugging Face. It does not require
a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first
launch and cached.
If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
The `requirements.txt` file uses the CPU wheel index
(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
the latest available prebuilt wheel in that index. This keeps the Space from
trying to compile llama.cpp from source. Do not use the CUDA wheel URL
(`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
libraries such as `libcudart.so.12`; otherwise model loading can fail when the
first button click triggers inference.
- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
model cannot load.
- Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo).
- Set `DEMO_MODE=off` if you want model-loading failures to surface immediately.
## Local run
```bash
pip install -r requirements.txt
DEMO_MODE=true python app.py
```
On Windows PowerShell:
```powershell
$env:DEMO_MODE="true"
python app.py
```
|