| --- |
| title: Case Lantern |
| colorFrom: pink |
| colorTo: blue |
| sdk: gradio |
| sdk_version: 6.15.2 |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| models: |
| - lastmass/Qwen3.5-Medical-GSPO |
|
|
| --- |
| |
| # ๐ฎ Case Lantern |
|
|
| Short Demo Video๏ผ |
|
|
| https://youtu.be/Bf3t5Cq6XuA |
|
|
| Case Lantern is a fictional medical mystery game for the |
| [Build Small Hackathon](https://huggingface.co/build-small-hackathon). |
| Players investigate a short Chinese case, request clues, avoid red herrings, and |
| submit a diagnosis within six turns. |
|
|
| The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO), |
| a small Chinese medical reasoning model with roughly 4.66B parameters, as the |
| game master and scorer. Inference runs locally via **llama.cpp** (GGUF Q4_K_M). |
|
|
| ## Track & Merit Badges |
|
|
| | Item | Detail | |
| |------|--------| |
| | Track | An Adventure in Thousand Token Wood | |
| | AI role | Load-bearing game master, clue writer, and scoring judge | |
| | Constraint | Small model under 32B parameters | |
| | UI | Gradio Space with custom dark frontend | |
|
|
| | Badge | Status | |
| |-------|--------| |
| | ๐๏ธ Off the Grid (LOCAL-FIRST) | โ
Model runs locally in the Space | |
| | ๐ธ Well-Tuned (FINE-TUNED) | โ
Uses fine-tuned model published on HF | |
| | ๐ฆ Llama Champion | โ
Runs via llama.cpp runtime | |
| | ๐จ Off-Brand (CUSTOM UI) | โ
Dark glassmorphism theme, custom CSS | |
|
|
| ## Safety framing |
|
|
| This is not a diagnosis or treatment tool. It only uses fictional cases for |
| learning-oriented gameplay. Users are explicitly asked not to provide personal |
| health information. |
|
|
| ## Deployment notes |
|
|
| The app is designed for **free CPU Spaces** on Hugging Face. It does not require |
| a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first |
| launch and cached. |
|
|
| If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel. |
| The `requirements.txt` file uses the CPU wheel index |
| (`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to |
| the latest available prebuilt wheel in that index. This keeps the Space from |
| trying to compile llama.cpp from source. Do not use the CUDA wheel URL |
| (`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime |
| libraries such as `libcudart.so.12`; otherwise model loading can fail when the |
| first button click triggers inference. |
|
|
| - Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the |
| model cannot load. |
| - Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo). |
| - Set `DEMO_MODE=off` if you want model-loading failures to surface immediately. |
|
|
| ## Local run |
|
|
| ```bash |
| pip install -r requirements.txt |
| DEMO_MODE=true python app.py |
| ``` |
|
|
| On Windows PowerShell: |
|
|
| ```powershell |
| $env:DEMO_MODE="true" |
| python app.py |
| ``` |
|
|