--- title: Case Lantern colorFrom: pink colorTo: blue sdk: gradio sdk_version: 6.15.2 app_file: app.py pinned: false license: apache-2.0 models: - lastmass/Qwen3.5-Medical-GSPO --- # 🏮 Case Lantern Short Demo Video: https://youtu.be/Bf3t5Cq6XuA Case Lantern is a fictional medical mystery game for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). Players investigate a short Chinese case, request clues, avoid red herrings, and submit a diagnosis within six turns. The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO), a small Chinese medical reasoning model with roughly 4.66B parameters, as the game master and scorer. Inference runs locally via **llama.cpp** (GGUF Q4_K_M). ## Track & Merit Badges | Item | Detail | |------|--------| | Track | An Adventure in Thousand Token Wood | | AI role | Load-bearing game master, clue writer, and scoring judge | | Constraint | Small model under 32B parameters | | UI | Gradio Space with custom dark frontend | | Badge | Status | |-------|--------| | 🏕️ Off the Grid (LOCAL-FIRST) | ✅ Model runs locally in the Space | | 🎸 Well-Tuned (FINE-TUNED) | ✅ Uses fine-tuned model published on HF | | 🦙 Llama Champion | ✅ Runs via llama.cpp runtime | | 🎨 Off-Brand (CUSTOM UI) | ✅ Dark glassmorphism theme, custom CSS | ## Safety framing This is not a diagnosis or treatment tool. It only uses fictional cases for learning-oriented gameplay. Users are explicitly asked not to provide personal health information. ## Deployment notes The app is designed for **free CPU Spaces** on Hugging Face. It does not require a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first launch and cached. If you deploy on **ZeroGPU**, keep the prebuilt CPU `llama-cpp-python` wheel. The `requirements.txt` file uses the CPU wheel index (`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to the latest available prebuilt wheel in that index. This keeps the Space from trying to compile llama.cpp from source. Do not use the CUDA wheel URL (`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime libraries such as `libcudart.so.12`; otherwise model loading can fail when the first button click triggers inference. - Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the model cannot load. - Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo). - Set `DEMO_MODE=off` if you want model-loading failures to surface immediately. ## Local run ```bash pip install -r requirements.txt DEMO_MODE=true python app.py ``` On Windows PowerShell: ```powershell $env:DEMO_MODE="true" python app.py ```