Spaces:

build-small-hackathon
/

virtual-characters

Sleeping

App Files Files Community

virtual-characters / README.md

ShadowInk

Deploy Virtual Characters for Build Small Hackathon

005e075 verified 18 days ago

preview code

Raw

History Blame Contribute Delete

6.45 kB

	---
	title: Virtual Characters
	emoji: 🎭
	colorFrom: blue
	colorTo: pink
	sdk: gradio
	sdk_version: 6.17.3
	app_file: app.py
	hf_oauth: true
	tags:
	- build-small-hackathon
	- small-models
	- virtual-character
	- multimodal
	- tts
	- modal
	- gradio
	- "track:backyard"
	- "sponsor:modal"
	- "achievement:offbrand"
	- "achievement:fieldnotes"
	---

	# Virtual Characters

	Virtual Characters is a small-model, multimodal companion demo for the Build Small Hackathon. The current public experience focuses on one original character, 星萤, with a live stage, expression switching, image-aware chat, and optional TTS playback.

	The app is built as a Hugging Face Gradio Space. Heavy inference runs on Modal endpoints, while the Space handles UI state, character cards, model status, OAuth-backed workshop persistence, and the stage renderer.

	## Why It Fits Build Small

	- All active models are under 32B parameters. The main chat endpoint is configured around a 12B-class Gemma model served by vLLM on Modal. The TTS path uses lightweight speech models such as Chatterbox/Kokoro-class voices. The image-generation spike and workshop path evaluate Qwen-Image/FLUX-style models as optional asset-generation services, also below the 32B ceiling.
	- No 70B+ or giant hosted assistant is used as the core runtime. The project is designed around smaller specialized services: chat, voice, image generation, matting, and stage control.
	- Original/off-brand character design. 星萤 is an original sci-fi communication-room character, not a clone of a commercial character.
	- Field notes included. The repo keeps benchmark notes and implementation notes for Modal cold starts, warm latency, TTS, image generation, and character-pack feasibility.

	## What You Can Try

	- Chat with 星萤 through text and image uploads.
	- Watch the stage switch expressions and motions from model output tags.
	- Generate playable TTS replies when the Modal TTS endpoint is available.
	- Open the `角色工坊` tab to draft/import Tavern-style character cards and test the asset-generation workflow.
	- Check model status cards for LLM, TTS, and image generation. If a Modal endpoint is asleep, the UI tells users to wait for cold start/model loading instead of failing silently.

	## Runtime Architecture

	```text
	Hugging Face Space (Gradio)
	├─ chat UI, stage renderer, role card UI
	├─ HF OAuth login for workshop save/resume
	├─ model status checks
	└─ Modal endpoints
	├─ vLLM chat: 12B-class Gemma endpoint
	├─ TTS: Chatterbox/Kokoro-class voices
	└─ image generation spike/workshop: Qwen-Image / FLUX candidates
	```

	## Model Notes

	The current configured stack is intentionally small-model oriented:

	\| Capability \| Model / Service \| Size Policy \|
	\| --- \| --- \| --- \|
	\| Dialogue \| `google/gemma-4-12B-it` served through Modal vLLM \| 12B-class, below 32B \|
	\| TTS \| Chatterbox/Kokoro-class Modal services \| lightweight speech models, below 32B \|
	\| Character image spike \| `Qwen/Qwen-Image`, `Qwen/Qwen-Image-Edit`, `FLUX.1-schnell` candidates \| optional asset-generation services, below 32B target \|
	\| Background removal \| `rembg` CPU matting path \| local utility model, below 32B \|

	The app exposes status cards because Modal endpoints can sleep. A sleeping endpoint is expected during demos; click refresh or retry after cold start completes.

	## Project Structure

	- `app.py`: Gradio UI, tabs, chat, workshop wiring, and model status controls.
	- `src/character_registry.py`: built-in character registry. The public demo currently ships only 星萤.
	- `src/stage_driver.py`: HTML/CSS stage renderer and expression/motion asset selection.
	- `src/dialogue_engine.py`: vLLM/OpenAI-compatible streaming, stage tag parsing, and TTS event handling.
	- `src/character_workshop.py`: Tavern JSON/form draft import, HF-login scoped save/resume, generation packaging, and install flow.
	- `src/model_status.py`: LLM/TTS/image-generation health checks.
	- `modal_apps/`: Modal deployment scripts for chat, TTS, and image-generation spikes.
	- `CHARACTER_GENERATION_SPIKE.md`: risk validation report for the character-generation pipeline.
	- `BENCHMARK_RESULTS.md`: Modal latency and deployment notes.

	## Local Development

	```powershell
	python -m pip install -r requirements.txt
	.\scripts\restart_gradio_background.ps1
	```

	Default local URL:

	```text
	http://127.0.0.1:7864
	```

	The restart script launches Gradio in the background and prints the PID, URL, stdout log path, stderr log path, LLM endpoint, and TTS endpoint.

	Mock mode avoids remote Modal calls:

	```powershell
	.\scripts\start_gradio_background.ps1 -Mock
	```

	Override endpoints when needed:

	```powershell
	.\scripts\start_gradio_background.ps1 -VllmUrl "https://your-vllm-endpoint.modal.run"
	.\scripts\restart_gradio_background.ps1 -TtsUrl "https://your-tts-endpoint.modal.run"
	```

	## Character Assets

	The public build includes one built-in character:

	```text
	assets/characters/star/
	assets/backgrounds/communication_room.png
	```

	Supported expression slots:

	```text
	idle, listening, thinking, worried, smile, happy
	```

	Motion-specific assets:

	```text
	assets/characters/star/talk.png
	assets/characters/star/focus.png
	```

	When the model emits `motion=talk` or `motion=focus`, `src/stage_driver.py` uses the dedicated action sprite. Other motions fall back to the current expression image.

	## Character Workshop

	The workshop is deliberately separated from the chat tab. Users can:

	1. Import a Tavern-style JSON card or fill a form.
	2. Generate four independent main-visual candidates.
	3. Select one candidate.
	4. Generate eight expression/action slots and one background.
	5. Run background removal and package assets for the stage driver.
	6. Install the generated character locally into the runtime registry.

	On Hugging Face Spaces, generation and install actions require HF OAuth login so each user's runs can be saved and resumed separately.

	## Modal Character Generation Spike

	The image-generation flow is still marked as an MVP/workshop path, not the core dependency for chatting. The spike scripts can be run separately:

	```powershell
	python scripts/run_character_generation_spike.py list-models
	python scripts/run_character_generation_spike.py modal-health
	python scripts/run_character_generation_spike.py modal-probe --candidate qwen_image --batch-size 4 --confirm-gpu
	```

	For full notes, see `CHARACTER_GENERATION_SPIKE.md`.