Spaces:

build-small-hackathon
/

Case-Lantern

Running

App Files Files Community

Case-Lantern / README.md

lastmass

Update README.md

a755fdb verified 5 days ago

preview code

raw

history blame contribute delete

2.75 kB

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

metadata

title: Case Lantern
colorFrom: pink
colorTo: blue
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
models:
  - lastmass/Qwen3.5-Medical-GSPO

🏮 Case Lantern

Short Demo Video：

https://youtu.be/Bf3t5Cq6XuA

Case Lantern is a fictional medical mystery game for the Build Small Hackathon. Players investigate a short Chinese case, request clues, avoid red herrings, and submit a diagnosis within six turns.

The experience uses lastmass/Qwen3.5-Medical-GSPO, a small Chinese medical reasoning model with roughly 4.66B parameters, as the game master and scorer. Inference runs locally via llama.cpp (GGUF Q4_K_M).

Track & Merit Badges

Item	Detail
Track	An Adventure in Thousand Token Wood
AI role	Load-bearing game master, clue writer, and scoring judge
Constraint	Small model under 32B parameters
UI	Gradio Space with custom dark frontend

Badge	Status
🏕️ Off the Grid (LOCAL-FIRST)	✅ Model runs locally in the Space
🎸 Well-Tuned (FINE-TUNED)	✅ Uses fine-tuned model published on HF
🦙 Llama Champion	✅ Runs via llama.cpp runtime
🎨 Off-Brand (CUSTOM UI)	✅ Dark glassmorphism theme, custom CSS

Safety framing

This is not a diagnosis or treatment tool. It only uses fictional cases for learning-oriented gameplay. Users are explicitly asked not to provide personal health information.

Deployment notes

The app is designed for free CPU Spaces on Hugging Face. It does not require a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first launch and cached.

If you deploy on ZeroGPU, keep the prebuilt CPU llama-cpp-python wheel. The requirements.txt file uses the CPU wheel index (llama-cpp-python/whl/cpu) plus --only-binary=llama-cpp-python, and pins to the latest available prebuilt wheel in that index. This keeps the Space from trying to compile llama.cpp from source. Do not use the CUDA wheel URL (llama-cpp-python/whl/cu124) unless the Space image also provides CUDA runtime libraries such as libcudart.so.12; otherwise model loading can fail when the first button click triggers inference.

Set DEMO_MODE=auto (default) to allow a graceful scripted fallback if the model cannot load.
Set DEMO_MODE=true to skip model loading entirely (instant UI-only demo).
Set DEMO_MODE=off if you want model-loading failures to surface immediately.

Local run

pip install -r requirements.txt
DEMO_MODE=true python app.py

On Windows PowerShell:

$env:DEMO_MODE="true"
python app.py