Case-Lantern / README.md
lastmass's picture
Update README.md
a755fdb verified
---
title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
- lastmass/Qwen3.5-Medical-GSPO
---
# ๐Ÿฎ Case Lantern
Short Demo Video๏ผš
https://youtu.be/Bf3t5Cq6XuA
Case Lantern is a fictional medical mystery game for the
[Build Small Hackathon](https://huggingface.co/build-small-hackathon).
Players investigate a short Chinese case, request clues, avoid red herrings, and
submit a diagnosis within six turns.
The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO),
a small Chinese medical reasoning model with roughly 4.66B parameters, as the
game master and scorer. Inference runs locally via **llama.cpp** (GGUF Q4_K_M).
## Track & Merit Badges
| Item | Detail |
|------|--------|
| Track | An Adventure in Thousand Token Wood |
| AI role | Load-bearing game master, clue writer, and scoring judge |
| Constraint | Small model under 32B parameters |
| UI | Gradio Space with custom dark frontend |
| Badge | Status |
|-------|--------|
| ๐Ÿ•๏ธ Off the Grid (LOCAL-FIRST) | โœ… Model runs locally in the Space |
| ๐ŸŽธ Well-Tuned (FINE-TUNED) | โœ… Uses fine-tuned model published on HF |
| ๐Ÿฆ™ Llama Champion | โœ… Runs via llama.cpp runtime |
| ๐ŸŽจ Off-Brand (CUSTOM UI) | โœ… Dark glassmorphism theme, custom CSS |
## Safety framing
This is not a diagnosis or treatment tool. It only uses fictional cases for
learning-oriented gameplay. Users are explicitly asked not to provide personal
health information.
## Deployment notes
The app is designed for **free CPU Spaces** on Hugging Face. It does not require
a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first
launch and cached.
If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
The `requirements.txt` file uses the CPU wheel index
(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
the latest available prebuilt wheel in that index. This keeps the Space from
trying to compile llama.cpp from source. Do not use the CUDA wheel URL
(`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
libraries such as `libcudart.so.12`; otherwise model loading can fail when the
first button click triggers inference.
- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
model cannot load.
- Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo).
- Set `DEMO_MODE=off` if you want model-loading failures to surface immediately.
## Local run
```bash
pip install -r requirements.txt
DEMO_MODE=true python app.py
```
On Windows PowerShell:
```powershell
$env:DEMO_MODE="true"
python app.py
```