Spaces:
Runtime error
Runtime error
Gemini Live Avatar - FAQ
Quick Start Guide
Prerequisites
- GPU: NVIDIA GPU with 11GB+ VRAM (recommended)
- Python: 3.10
- CUDA: 11.8
- OS: Windows/Linux
Installation
- Clone Repository
git clone https://github.com/Kedreamix/Linly-Talker.git
cd Linly-Talker
- Create Environment
conda create -n linly python=3.10
conda activate linly
- Install PyTorch
# CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
- Install Dependencies
conda install -q ffmpeg
pip install -r requirements_webui.txt
# MuseTalk dependencies
pip install --no-cache-dir -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
- Download Models
Download the required models from one of these sources:
- Baidu Netdisk (Password: linl)
- HuggingFace
- ModelScope
Required Models:
- MuseTalk models β
Musetalk/models/ - SadTalker checkpoints β
checkpoints/ - Face detection models β
gfpgan/weights/
- Launch
python webui.py
Open http://localhost:7860 in your browser.
Common Issues
1. Installation Issues
Q: Microsoft Visual C++ 14.0 is required
A: Install Microsoft C++ Build Tools
Q: version GLIBCXX_3.4.* not found
A: Use Python 3.10 or downgrade libraries:
pip install pyopenjtalk==0.3.1
pip install opencc==1.1.1
Q: FFMPEG not found
A: Install via conda:
conda install -q ffmpeg
Or on Linux:
sudo apt install ffmpeg
2. Model & Weight Issues
Q: FileNotFoundError for model weights
A: Ensure models are in correct folders:
Linly-Talker/
βββ checkpoints/
β βββ mapping_00109-model.pth.tar (149MB)
β βββ mapping_00229-model.pth.tar (149MB)
β βββ ...
βββ Musetalk/
β βββ models/
β βββ musetalk/
β βββ dwpose/
β βββ ...
βββ gfpgan/
βββ weights/
Q: SadTalker Error: invalid load key, 'v'
A: Re-download mapping_*.pth.tar files (they should be 149MB each):
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00109-model.pth.tar
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00229-model.pth.tar
Q: File is not a zip file (NLTK error)
A: Manually download nltk_data:
import nltk
print(nltk.data.path) # Find cache path
Download from Quark Netdisk and place in cache path.
3. Runtime Issues
Q: VRAM overflow / Out of Memory
A:
- Minimum: 6GB VRAM (SadTalker only)
- Recommended: 11GB+ VRAM (MuseTalk)
- Solution: Use lower resolution images or reduce batch size
Q: GFPGANer is not defined
A: Install enhancement module:
pip install gfpgan
Q: Gradio Connection errored out
A:
- Check firewall settings
- Try different port in
webui.py:
demo.launch(server_port=7861) # Change port
Q: Avatar preparation fails
A:
- Use clear frontal face images/videos
- Recommended resolution: 512x512 to 1024x1024
- Supported formats:
.jpg,.png,.mp4
4. Gemini Live Specific Issues
Q: WebSocket connection fails
A:
- Verify Railway bridge is running:
wss://gemini-live-bridge-production.up.railway.app/ws - Check internet connection
- Ensure no firewall blocking WebSocket connections
Q: No audio playback
A:
- Check browser audio permissions
- Verify
speaker_outputcomponent hasautoplay=True - Test with different browser (Chrome recommended)
Q: Avatar not lip-syncing
A:
- Click "π Prepare Avatar" and wait for "β Ready"
- Click "π Connect to Gemini" and wait for "β Connected"
- Ensure microphone permissions are granted
- Check audio buffer is receiving data
Q: High latency / Lag
A:
- Target: <1 second end-to-end
- Optimize:
- Use GPU (not CPU)
- Reduce image resolution
- Set
return_frame_only=Trueininference_streaming()for faster rendering - Check network speed to Railway bridge
5. Usage Tips
Q: How to use custom avatar?
A:
- Uncheck "Use Default Avatar"
- Upload your image/video (frontal face, clear features)
- Adjust "Mouth Position Fix" slider if needed
- Click "π Prepare Avatar"
Q: How to adjust mouth position?
A: Use the "BBox Shift" slider:
- Positive values (+): Move mouth down
- Negative values (-): Move mouth up
- Default: 5
Q: Best practices for demo?
A:
- Preparation: Always prepare avatar before connecting
- Connection: Wait for "β Connected" status
- Speaking: Speak clearly, natural pace
- Interruption: Gemini 2.5 Flash handles interruptions natively - try it!
- Quality: Use good microphone for best results
Performance Benchmarks
| Component | Latency | VRAM Usage |
|---|---|---|
| WebSocket (Railway) | ~50ms | 0GB |
| Gemini 2.5 Flash | ~200ms | 0GB (Cloud) |
| MuseTalk Inference | ~40ms/frame | 6-8GB |
| Audio Buffer | ~200ms | <1GB |
| Total End-to-End | ~500ms | 8-11GB |
System Requirements
Minimum
- GPU: 6GB VRAM
- RAM: 8GB
- CPU: 4 cores
- Network: 10 Mbps
Recommended
- GPU: 11GB+ VRAM (RTX 2080 Ti / RTX 3060 or better)
- RAM: 16GB
- CPU: 8 cores
- Network: 50 Mbps
Troubleshooting Checklist
Before reporting issues, verify:
- Python 3.10 installed
- CUDA 11.8 installed (for GPU)
- All model weights downloaded (check file sizes)
- Models in correct folder structure
- Dependencies installed (
requirements_webui.txt) - FFMPEG installed
- Sufficient VRAM available
- Railway bridge is accessible
- Firewall allows WebSocket connections
- Browser has microphone permissions
Getting Help
- Check this FAQ first
- Review error messages - most include hints
- Check model file sizes - incomplete downloads are common
- Try with default avatar - isolates custom image issues
- Report issues with:
- Full error message
- Python version
- GPU model
- Steps to reproduce
Links
- GitHub: Kedreamix/Linly-Talker
- Models: HuggingFace | ModelScope
- Railway Bridge: gemini-live-bridge
Last Updated: February 2026
Version: Gemini Live Integration v1.0