Case-Lantern / README.md
lastmass's picture
Update README.md
a755fdb verified

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
  - lastmass/Qwen3.5-Medical-GSPO

🏮 Case Lantern

Short Demo Video:

https://youtu.be/Bf3t5Cq6XuA

Case Lantern is a fictional medical mystery game for the Build Small Hackathon. Players investigate a short Chinese case, request clues, avoid red herrings, and submit a diagnosis within six turns.

The experience uses lastmass/Qwen3.5-Medical-GSPO, a small Chinese medical reasoning model with roughly 4.66B parameters, as the game master and scorer. Inference runs locally via llama.cpp (GGUF Q4_K_M).

Track & Merit Badges

Item Detail
Track An Adventure in Thousand Token Wood
AI role Load-bearing game master, clue writer, and scoring judge
Constraint Small model under 32B parameters
UI Gradio Space with custom dark frontend
Badge Status
🏕️ Off the Grid (LOCAL-FIRST) ✅ Model runs locally in the Space
🎸 Well-Tuned (FINE-TUNED) ✅ Uses fine-tuned model published on HF
🦙 Llama Champion ✅ Runs via llama.cpp runtime
🎨 Off-Brand (CUSTOM UI) ✅ Dark glassmorphism theme, custom CSS

Safety framing

This is not a diagnosis or treatment tool. It only uses fictional cases for learning-oriented gameplay. Users are explicitly asked not to provide personal health information.

Deployment notes

The app is designed for free CPU Spaces on Hugging Face. It does not require a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first launch and cached.

If you deploy on ZeroGPU, keep the prebuilt CPU llama-cpp-python wheel. The requirements.txt file uses the CPU wheel index (llama-cpp-python/whl/cpu) plus --only-binary=llama-cpp-python, and pins to the latest available prebuilt wheel in that index. This keeps the Space from trying to compile llama.cpp from source. Do not use the CUDA wheel URL (llama-cpp-python/whl/cu124) unless the Space image also provides CUDA runtime libraries such as libcudart.so.12; otherwise model loading can fail when the first button click triggers inference.

  • Set DEMO_MODE=auto (default) to allow a graceful scripted fallback if the model cannot load.
  • Set DEMO_MODE=true to skip model loading entirely (instant UI-only demo).
  • Set DEMO_MODE=off if you want model-loading failures to surface immediately.

Local run

pip install -r requirements.txt
DEMO_MODE=true python app.py

On Windows PowerShell:

$env:DEMO_MODE="true"
python app.py