personaxgemini / FAQ.md
eshwar06's picture
Upload 29 files
229897d verified
# Gemini Live Avatar - FAQ
## Quick Start Guide
### Prerequisites
- **GPU**: NVIDIA GPU with 11GB+ VRAM (recommended)
- **Python**: 3.10
- **CUDA**: 11.8
- **OS**: Windows/Linux
### Installation
1. **Clone Repository**
```bash
git clone https://github.com/Kedreamix/Linly-Talker.git
cd Linly-Talker
```
2. **Create Environment**
```bash
conda create -n linly python=3.10
conda activate linly
```
3. **Install PyTorch**
```bash
# CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```
4. **Install Dependencies**
```bash
conda install -q ffmpeg
pip install -r requirements_webui.txt
# MuseTalk dependencies
pip install --no-cache-dir -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
```
5. **Download Models**
Download the required models from one of these sources:
- [Baidu Netdisk](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: linl)
- [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker)
- [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)
**Required Models:**
- MuseTalk models → `Musetalk/models/`
- SadTalker checkpoints → `checkpoints/`
- Face detection models → `gfpgan/weights/`
6. **Launch**
```bash
python webui.py
```
Open `http://localhost:7860` in your browser.
---
## Common Issues
### 1. Installation Issues
#### Q: `Microsoft Visual C++ 14.0 is required`
**A:** Install [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
#### Q: `version GLIBCXX_3.4.* not found`
**A:** Use Python 3.10 or downgrade libraries:
```bash
pip install pyopenjtalk==0.3.1
pip install opencc==1.1.1
```
#### Q: FFMPEG not found
**A:** Install via conda:
```bash
conda install -q ffmpeg
```
Or on Linux:
```bash
sudo apt install ffmpeg
```
---
### 2. Model & Weight Issues
#### Q: `FileNotFoundError` for model weights
**A:** Ensure models are in correct folders:
```
Linly-Talker/
├── checkpoints/
│ ├── mapping_00109-model.pth.tar (149MB)
│ ├── mapping_00229-model.pth.tar (149MB)
│ └── ...
├── Musetalk/
│ └── models/
│ ├── musetalk/
│ ├── dwpose/
│ └── ...
└── gfpgan/
└── weights/
```
#### Q: `SadTalker Error: invalid load key, 'v'`
**A:** Re-download `mapping_*.pth.tar` files (they should be 149MB each):
```bash
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00109-model.pth.tar
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00229-model.pth.tar
```
#### Q: `File is not a zip file` (NLTK error)
**A:** Manually download `nltk_data`:
```python
import nltk
print(nltk.data.path) # Find cache path
```
Download from [Quark Netdisk](https://pan.quark.cn/s/f48f5e35796b) and place in cache path.
---
### 3. Runtime Issues
#### Q: VRAM overflow / Out of Memory
**A:**
- **Minimum**: 6GB VRAM (SadTalker only)
- **Recommended**: 11GB+ VRAM (MuseTalk)
- **Solution**: Use lower resolution images or reduce batch size
#### Q: `GFPGANer is not defined`
**A:** Install enhancement module:
```bash
pip install gfpgan
```
#### Q: `Gradio Connection errored out`
**A:**
- Check firewall settings
- Try different port in `webui.py`:
```python
demo.launch(server_port=7861) # Change port
```
#### Q: Avatar preparation fails
**A:**
- Use clear frontal face images/videos
- Recommended resolution: 512x512 to 1024x1024
- Supported formats: `.jpg`, `.png`, `.mp4`
---
### 4. Gemini Live Specific Issues
#### Q: WebSocket connection fails
**A:**
- Verify Railway bridge is running: `wss://gemini-live-bridge-production.up.railway.app/ws`
- Check internet connection
- Ensure no firewall blocking WebSocket connections
#### Q: No audio playback
**A:**
- Check browser audio permissions
- Verify `speaker_output` component has `autoplay=True`
- Test with different browser (Chrome recommended)
#### Q: Avatar not lip-syncing
**A:**
1. Click "🎭 Prepare Avatar" and wait for "✅ Ready"
2. Click "🔌 Connect to Gemini" and wait for "✅ Connected"
3. Ensure microphone permissions are granted
4. Check audio buffer is receiving data
#### Q: High latency / Lag
**A:**
- **Target**: <1 second end-to-end
- **Optimize**:
- Use GPU (not CPU)
- Reduce image resolution
- Set `return_frame_only=True` in `inference_streaming()` for faster rendering
- Check network speed to Railway bridge
---
### 5. Usage Tips
#### Q: How to use custom avatar?
**A:**
1. Uncheck "Use Default Avatar"
2. Upload your image/video (frontal face, clear features)
3. Adjust "Mouth Position Fix" slider if needed
4. Click "🎭 Prepare Avatar"
#### Q: How to adjust mouth position?
**A:** Use the "BBox Shift" slider:
- **Positive values** (+): Move mouth down
- **Negative values** (-): Move mouth up
- Default: 5
#### Q: Best practices for demo?
**A:**
1. **Preparation**: Always prepare avatar before connecting
2. **Connection**: Wait for "✅ Connected" status
3. **Speaking**: Speak clearly, natural pace
4. **Interruption**: Gemini 2.5 Flash handles interruptions natively - try it!
5. **Quality**: Use good microphone for best results
---
## Performance Benchmarks
| Component | Latency | VRAM Usage |
|-----------|---------|------------|
| WebSocket (Railway) | ~50ms | 0GB |
| Gemini 2.5 Flash | ~200ms | 0GB (Cloud) |
| MuseTalk Inference | ~40ms/frame | 6-8GB |
| Audio Buffer | ~200ms | <1GB |
| **Total End-to-End** | **~500ms** | **8-11GB** |
---
## System Requirements
### Minimum
- GPU: 6GB VRAM
- RAM: 8GB
- CPU: 4 cores
- Network: 10 Mbps
### Recommended
- GPU: 11GB+ VRAM (RTX 2080 Ti / RTX 3060 or better)
- RAM: 16GB
- CPU: 8 cores
- Network: 50 Mbps
---
## Troubleshooting Checklist
Before reporting issues, verify:
- [ ] Python 3.10 installed
- [ ] CUDA 11.8 installed (for GPU)
- [ ] All model weights downloaded (check file sizes)
- [ ] Models in correct folder structure
- [ ] Dependencies installed (`requirements_webui.txt`)
- [ ] FFMPEG installed
- [ ] Sufficient VRAM available
- [ ] Railway bridge is accessible
- [ ] Firewall allows WebSocket connections
- [ ] Browser has microphone permissions
---
## Getting Help
1. **Check this FAQ first**
2. **Review error messages** - most include hints
3. **Check model file sizes** - incomplete downloads are common
4. **Try with default avatar** - isolates custom image issues
5. **Report issues** with:
- Full error message
- Python version
- GPU model
- Steps to reproduce
---
## Links
- **GitHub**: [Kedreamix/Linly-Talker](https://github.com/Kedreamix/Linly-Talker)
- **Models**: [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker) | [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)
- **Railway Bridge**: [gemini-live-bridge](https://gemini-live-bridge-production.up.railway.app)
---
**Last Updated**: February 2026
**Version**: Gemini Live Integration v1.0