Spaces:

build-small-hackathon
/

Case-Lantern

Running

App Files Files Community

Case-Lantern / README.md

lastmass

Update README.md

a755fdb verified 5 days ago

preview code

raw

history blame contribute delete

2.75 kB

	---
	title: Case Lantern
	colorFrom: pink
	colorTo: blue
	sdk: gradio
	sdk_version: 6.15.2
	app_file: app.py
	pinned: false
	license: apache-2.0
	models:
	- lastmass/Qwen3.5-Medical-GSPO

	---

	# 🏮 Case Lantern

	Short Demo Video：

	https://youtu.be/Bf3t5Cq6XuA

	Case Lantern is a fictional medical mystery game for the
	[Build Small Hackathon](https://huggingface.co/build-small-hackathon).
	Players investigate a short Chinese case, request clues, avoid red herrings, and
	submit a diagnosis within six turns.

	The experience uses [`lastmass/Qwen3.5-Medical-GSPO`](https://huggingface.co/lastmass/Qwen3.5-Medical-GSPO),
	a small Chinese medical reasoning model with roughly 4.66B parameters, as the
	game master and scorer. Inference runs locally via llama.cpp (GGUF Q4_K_M).

	## Track & Merit Badges

	\| Item \| Detail \|
	\|------\|--------\|
	\| Track \| An Adventure in Thousand Token Wood \|
	\| AI role \| Load-bearing game master, clue writer, and scoring judge \|
	\| Constraint \| Small model under 32B parameters \|
	\| UI \| Gradio Space with custom dark frontend \|

	\| Badge \| Status \|
	\|-------\|--------\|
	\| 🏕️ Off the Grid (LOCAL-FIRST) \| ✅ Model runs locally in the Space \|
	\| 🎸 Well-Tuned (FINE-TUNED) \| ✅ Uses fine-tuned model published on HF \|
	\| 🦙 Llama Champion \| ✅ Runs via llama.cpp runtime \|
	\| 🎨 Off-Brand (CUSTOM UI) \| ✅ Dark glassmorphism theme, custom CSS \|

	## Safety framing

	This is not a diagnosis or treatment tool. It only uses fictional cases for
	learning-oriented gameplay. Users are explicitly asked not to provide personal
	health information.

	## Deployment notes

	The app is designed for free CPU Spaces on Hugging Face. It does not require
	a GPU. The GGUF model (~2.78 GB, Q4_K_M) is downloaded from the Hub at first
	launch and cached.

	If you deploy on ZeroGPU, keep the prebuilt CPU `llama-cpp-python` wheel.
	The `requirements.txt` file uses the CPU wheel index
	(`llama-cpp-python/whl/cpu`) plus `--only-binary=llama-cpp-python`, and pins to
	the latest available prebuilt wheel in that index. This keeps the Space from
	trying to compile llama.cpp from source. Do not use the CUDA wheel URL
	(`llama-cpp-python/whl/cu124`) unless the Space image also provides CUDA runtime
	libraries such as `libcudart.so.12`; otherwise model loading can fail when the
	first button click triggers inference.

	- Set `DEMO_MODE=auto` (default) to allow a graceful scripted fallback if the
	model cannot load.
	- Set `DEMO_MODE=true` to skip model loading entirely (instant UI-only demo).
	- Set `DEMO_MODE=off` if you want model-loading failures to surface immediately.

	## Local run

	```bash
	pip install -r requirements.txt
	DEMO_MODE=true python app.py
	```

	On Windows PowerShell:

	```powershell
	$env:DEMO_MODE="true"
	python app.py
	```