virtual-characters / README.md
ShadowInk's picture
Deploy Virtual Characters for Build Small Hackathon
005e075 verified
|
Raw
History Blame Contribute Delete
6.45 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Virtual Characters
emoji: 🎭
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.17.3
app_file: app.py
hf_oauth: true
tags:
  - build-small-hackathon
  - small-models
  - virtual-character
  - multimodal
  - tts
  - modal
  - gradio
  - track:backyard
  - sponsor:modal
  - achievement:offbrand
  - achievement:fieldnotes

Virtual Characters

Virtual Characters is a small-model, multimodal companion demo for the Build Small Hackathon. The current public experience focuses on one original character, 星萀, with a live stage, expression switching, image-aware chat, and optional TTS playback.

The app is built as a Hugging Face Gradio Space. Heavy inference runs on Modal endpoints, while the Space handles UI state, character cards, model status, OAuth-backed workshop persistence, and the stage renderer.

Why It Fits Build Small

  • All active models are under 32B parameters. The main chat endpoint is configured around a 12B-class Gemma model served by vLLM on Modal. The TTS path uses lightweight speech models such as Chatterbox/Kokoro-class voices. The image-generation spike and workshop path evaluate Qwen-Image/FLUX-style models as optional asset-generation services, also below the 32B ceiling.
  • No 70B+ or giant hosted assistant is used as the core runtime. The project is designed around smaller specialized services: chat, voice, image generation, matting, and stage control.
  • Original/off-brand character design. 星萀 is an original sci-fi communication-room character, not a clone of a commercial character.
  • Field notes included. The repo keeps benchmark notes and implementation notes for Modal cold starts, warm latency, TTS, image generation, and character-pack feasibility.

What You Can Try

  • Chat with 星萀 through text and image uploads.
  • Watch the stage switch expressions and motions from model output tags.
  • Generate playable TTS replies when the Modal TTS endpoint is available.
  • Open the 角色ε·₯坊 tab to draft/import Tavern-style character cards and test the asset-generation workflow.
  • Check model status cards for LLM, TTS, and image generation. If a Modal endpoint is asleep, the UI tells users to wait for cold start/model loading instead of failing silently.

Runtime Architecture

Hugging Face Space (Gradio)
  β”œβ”€ chat UI, stage renderer, role card UI
  β”œβ”€ HF OAuth login for workshop save/resume
  β”œβ”€ model status checks
  └─ Modal endpoints
       β”œβ”€ vLLM chat: 12B-class Gemma endpoint
       β”œβ”€ TTS: Chatterbox/Kokoro-class voices
       └─ image generation spike/workshop: Qwen-Image / FLUX candidates

Model Notes

The current configured stack is intentionally small-model oriented:

Capability Model / Service Size Policy
Dialogue google/gemma-4-12B-it served through Modal vLLM 12B-class, below 32B
TTS Chatterbox/Kokoro-class Modal services lightweight speech models, below 32B
Character image spike Qwen/Qwen-Image, Qwen/Qwen-Image-Edit, FLUX.1-schnell candidates optional asset-generation services, below 32B target
Background removal rembg CPU matting path local utility model, below 32B

The app exposes status cards because Modal endpoints can sleep. A sleeping endpoint is expected during demos; click refresh or retry after cold start completes.

Project Structure

  • app.py: Gradio UI, tabs, chat, workshop wiring, and model status controls.
  • src/character_registry.py: built-in character registry. The public demo currently ships only 星萀.
  • src/stage_driver.py: HTML/CSS stage renderer and expression/motion asset selection.
  • src/dialogue_engine.py: vLLM/OpenAI-compatible streaming, stage tag parsing, and TTS event handling.
  • src/character_workshop.py: Tavern JSON/form draft import, HF-login scoped save/resume, generation packaging, and install flow.
  • src/model_status.py: LLM/TTS/image-generation health checks.
  • modal_apps/: Modal deployment scripts for chat, TTS, and image-generation spikes.
  • CHARACTER_GENERATION_SPIKE.md: risk validation report for the character-generation pipeline.
  • BENCHMARK_RESULTS.md: Modal latency and deployment notes.

Local Development

python -m pip install -r requirements.txt
.\scripts\restart_gradio_background.ps1

Default local URL:

http://127.0.0.1:7864

The restart script launches Gradio in the background and prints the PID, URL, stdout log path, stderr log path, LLM endpoint, and TTS endpoint.

Mock mode avoids remote Modal calls:

.\scripts\start_gradio_background.ps1 -Mock

Override endpoints when needed:

.\scripts\start_gradio_background.ps1 -VllmUrl "https://your-vllm-endpoint.modal.run"
.\scripts\restart_gradio_background.ps1 -TtsUrl "https://your-tts-endpoint.modal.run"

Character Assets

The public build includes one built-in character:

assets/characters/star/
assets/backgrounds/communication_room.png

Supported expression slots:

idle, listening, thinking, worried, smile, happy

Motion-specific assets:

assets/characters/star/talk.png
assets/characters/star/focus.png

When the model emits motion=talk or motion=focus, src/stage_driver.py uses the dedicated action sprite. Other motions fall back to the current expression image.

Character Workshop

The workshop is deliberately separated from the chat tab. Users can:

  1. Import a Tavern-style JSON card or fill a form.
  2. Generate four independent main-visual candidates.
  3. Select one candidate.
  4. Generate eight expression/action slots and one background.
  5. Run background removal and package assets for the stage driver.
  6. Install the generated character locally into the runtime registry.

On Hugging Face Spaces, generation and install actions require HF OAuth login so each user's runs can be saved and resumed separately.

Modal Character Generation Spike

The image-generation flow is still marked as an MVP/workshop path, not the core dependency for chatting. The spike scripts can be run separately:

python scripts/run_character_generation_spike.py list-models
python scripts/run_character_generation_spike.py modal-health
python scripts/run_character_generation_spike.py modal-probe --candidate qwen_image --batch-size 4 --confirm-gpu

For full notes, see CHARACTER_GENERATION_SPIKE.md.