| --- |
| title: Virtual Characters |
| emoji: π |
| colorFrom: blue |
| colorTo: pink |
| sdk: gradio |
| sdk_version: 6.17.3 |
| app_file: app.py |
| hf_oauth: true |
| tags: |
| - build-small-hackathon |
| - small-models |
| - virtual-character |
| - multimodal |
| - tts |
| - modal |
| - gradio |
| - "track:backyard" |
| - "sponsor:modal" |
| - "achievement:offbrand" |
| - "achievement:fieldnotes" |
| --- |
| |
| # Virtual Characters |
|
|
| Virtual Characters is a small-model, multimodal companion demo for the Build Small Hackathon. The current public experience focuses on one original character, **ζθ€**, with a live stage, expression switching, image-aware chat, and optional TTS playback. |
|
|
| The app is built as a Hugging Face Gradio Space. Heavy inference runs on Modal endpoints, while the Space handles UI state, character cards, model status, OAuth-backed workshop persistence, and the stage renderer. |
|
|
| ## Why It Fits Build Small |
|
|
| - **All active models are under 32B parameters.** The main chat endpoint is configured around a 12B-class Gemma model served by vLLM on Modal. The TTS path uses lightweight speech models such as Chatterbox/Kokoro-class voices. The image-generation spike and workshop path evaluate Qwen-Image/FLUX-style models as optional asset-generation services, also below the 32B ceiling. |
| - **No 70B+ or giant hosted assistant is used as the core runtime.** The project is designed around smaller specialized services: chat, voice, image generation, matting, and stage control. |
| - **Original/off-brand character design.** ζθ€ is an original sci-fi communication-room character, not a clone of a commercial character. |
| - **Field notes included.** The repo keeps benchmark notes and implementation notes for Modal cold starts, warm latency, TTS, image generation, and character-pack feasibility. |
|
|
| ## What You Can Try |
|
|
| - Chat with ζθ€ through text and image uploads. |
| - Watch the stage switch expressions and motions from model output tags. |
| - Generate playable TTS replies when the Modal TTS endpoint is available. |
| - Open the `θ§θ²ε·₯ε` tab to draft/import Tavern-style character cards and test the asset-generation workflow. |
| - Check model status cards for LLM, TTS, and image generation. If a Modal endpoint is asleep, the UI tells users to wait for cold start/model loading instead of failing silently. |
|
|
| ## Runtime Architecture |
|
|
| ```text |
| Hugging Face Space (Gradio) |
| ββ chat UI, stage renderer, role card UI |
| ββ HF OAuth login for workshop save/resume |
| ββ model status checks |
| ββ Modal endpoints |
| ββ vLLM chat: 12B-class Gemma endpoint |
| ββ TTS: Chatterbox/Kokoro-class voices |
| ββ image generation spike/workshop: Qwen-Image / FLUX candidates |
| ``` |
|
|
| ## Model Notes |
|
|
| The current configured stack is intentionally small-model oriented: |
|
|
| | Capability | Model / Service | Size Policy | |
| | --- | --- | --- | |
| | Dialogue | `google/gemma-4-12B-it` served through Modal vLLM | 12B-class, below 32B | |
| | TTS | Chatterbox/Kokoro-class Modal services | lightweight speech models, below 32B | |
| | Character image spike | `Qwen/Qwen-Image`, `Qwen/Qwen-Image-Edit`, `FLUX.1-schnell` candidates | optional asset-generation services, below 32B target | |
| | Background removal | `rembg` CPU matting path | local utility model, below 32B | |
|
|
| The app exposes status cards because Modal endpoints can sleep. A sleeping endpoint is expected during demos; click refresh or retry after cold start completes. |
|
|
| ## Project Structure |
|
|
| - `app.py`: Gradio UI, tabs, chat, workshop wiring, and model status controls. |
| - `src/character_registry.py`: built-in character registry. The public demo currently ships only ζθ€. |
| - `src/stage_driver.py`: HTML/CSS stage renderer and expression/motion asset selection. |
| - `src/dialogue_engine.py`: vLLM/OpenAI-compatible streaming, stage tag parsing, and TTS event handling. |
| - `src/character_workshop.py`: Tavern JSON/form draft import, HF-login scoped save/resume, generation packaging, and install flow. |
| - `src/model_status.py`: LLM/TTS/image-generation health checks. |
| - `modal_apps/`: Modal deployment scripts for chat, TTS, and image-generation spikes. |
| - `CHARACTER_GENERATION_SPIKE.md`: risk validation report for the character-generation pipeline. |
| - `BENCHMARK_RESULTS.md`: Modal latency and deployment notes. |
|
|
| ## Local Development |
|
|
| ```powershell |
| python -m pip install -r requirements.txt |
| .\scripts\restart_gradio_background.ps1 |
| ``` |
|
|
| Default local URL: |
|
|
| ```text |
| http://127.0.0.1:7864 |
| ``` |
|
|
| The restart script launches Gradio in the background and prints the PID, URL, stdout log path, stderr log path, LLM endpoint, and TTS endpoint. |
|
|
| Mock mode avoids remote Modal calls: |
|
|
| ```powershell |
| .\scripts\start_gradio_background.ps1 -Mock |
| ``` |
|
|
| Override endpoints when needed: |
|
|
| ```powershell |
| .\scripts\start_gradio_background.ps1 -VllmUrl "https://your-vllm-endpoint.modal.run" |
| .\scripts\restart_gradio_background.ps1 -TtsUrl "https://your-tts-endpoint.modal.run" |
| ``` |
|
|
| ## Character Assets |
|
|
| The public build includes one built-in character: |
|
|
| ```text |
| assets/characters/star/ |
| assets/backgrounds/communication_room.png |
| ``` |
|
|
| Supported expression slots: |
|
|
| ```text |
| idle, listening, thinking, worried, smile, happy |
| ``` |
|
|
| Motion-specific assets: |
|
|
| ```text |
| assets/characters/star/talk.png |
| assets/characters/star/focus.png |
| ``` |
|
|
| When the model emits `motion=talk` or `motion=focus`, `src/stage_driver.py` uses the dedicated action sprite. Other motions fall back to the current expression image. |
|
|
| ## Character Workshop |
|
|
| The workshop is deliberately separated from the chat tab. Users can: |
|
|
| 1. Import a Tavern-style JSON card or fill a form. |
| 2. Generate four independent main-visual candidates. |
| 3. Select one candidate. |
| 4. Generate eight expression/action slots and one background. |
| 5. Run background removal and package assets for the stage driver. |
| 6. Install the generated character locally into the runtime registry. |
|
|
| On Hugging Face Spaces, generation and install actions require HF OAuth login so each user's runs can be saved and resumed separately. |
|
|
| ## Modal Character Generation Spike |
|
|
| The image-generation flow is still marked as an MVP/workshop path, not the core dependency for chatting. The spike scripts can be run separately: |
|
|
| ```powershell |
| python scripts/run_character_generation_spike.py list-models |
| python scripts/run_character_generation_spike.py modal-health |
| python scripts/run_character_generation_spike.py modal-probe --candidate qwen_image --batch-size 4 --confirm-gpu |
| ``` |
|
|
| For full notes, see `CHARACTER_GENERATION_SPIKE.md`. |
|
|