---
title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
  - lastmass/Qwen3.5-Medical-GSPO

---

# 🏮 Case Lantern

Short Demo Video：

https://youtu.be/Bf3t5Cq6XuA

Case Lantern is a fictional medical mystery game for the
[Build Small Hackathon](https://huggingface.co/build-small-hackathon).
Players investigate a short Chinese case, request clues, avoid red herrings, and
submit a diagnosis within six turns.

The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO),
a small Chinese medical reasoning model with roughly 4.66B parameters, as the
game master and scorer. Inference runs locally via **llama.cpp** (GGUF Q4_K_M).

## Track & Merit Badges

| Item | Detail |
|------|--------|
| Track | An Adventure in Thousand Token Wood |
| AI role | Load-bearing game master, clue writer, and scoring judge |
| Constraint | Small model under 32B parameters |
| UI | Gradio Space with custom dark frontend |

| Badge | Status |
|-------|--------|
| 🏕️ Off the Grid (LOCAL-FIRST) | ✅ Model runs locally in the Space |
| 🎸 Well-Tuned (FINE-TUNED) | ✅ Uses fine-tuned model published on HF |
| 🦙 Llama Champion | ✅ Runs via llama.cpp runtime |
| 🎨 Off-Brand (CUSTOM UI) | ✅ Dark glassmorphism theme, custom CSS |

## Safety framing

This is not a diagnosis or treatment tool. It only uses fictional cases for
learning-oriented gameplay. Users are explicitly asked not to provide personal
health information.

## Deployment notes

The app is designed for **free CPU Spaces** on Hugging Face. It does not require
a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first
launch and cached.

If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel.
The `requirements.txt` file uses the CPU wheel index
(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
the latest available prebuilt wheel in that index. This keeps the Space from
trying to compile llama.cpp from source. Do not use the CUDA wheel URL
(`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
libraries such as `libcudart.so.12`; otherwise model loading can fail when the
first button click triggers inference.

- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
  model cannot load.
- Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo).
- Set `DEMO_MODE=off` if you want model-loading failures to surface immediately.

## Local run

```bash
pip install -r requirements.txt
DEMO_MODE=true python app.py
```

On Windows PowerShell:

```powershell
$env:DEMO_MODE="true"
python app.py
```