# Gemini Live Avatar - FAQ

## Quick Start Guide

### Prerequisites
- **GPU**: NVIDIA GPU with 11GB+ VRAM (recommended)
- **Python**: 3.10
- **CUDA**: 11.8
- **OS**: Windows/Linux

### Installation

1. **Clone Repository**
```bash
git clone https://github.com/Kedreamix/Linly-Talker.git
cd Linly-Talker
```

2. **Create Environment**
```bash
conda create -n linly python=3.10
conda activate linly
```

3. **Install PyTorch**
```bash
# CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```

4. **Install Dependencies**
```bash
conda install -q ffmpeg
pip install -r requirements_webui.txt

# MuseTalk dependencies
pip install --no-cache-dir -U openmim
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0"
```

5. **Download Models**

Download the required models from one of these sources:
- [Baidu Netdisk](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: linl)
- [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker)
- [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)

**Required Models:**
- MuseTalk models → `Musetalk/models/`
- SadTalker checkpoints → `checkpoints/`
- Face detection models → `gfpgan/weights/`

6. **Launch**
```bash
python webui.py
```

Open `http://localhost:7860` in your browser.

---

## Common Issues

### 1. Installation Issues

#### Q: `Microsoft Visual C++ 14.0 is required`
**A:** Install [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)

#### Q: `version GLIBCXX_3.4.* not found`
**A:** Use Python 3.10 or downgrade libraries:
```bash
pip install pyopenjtalk==0.3.1
pip install opencc==1.1.1
```

#### Q: FFMPEG not found
**A:** Install via conda:
```bash
conda install -q ffmpeg
```

Or on Linux:
```bash
sudo apt install ffmpeg
```

---

### 2. Model & Weight Issues

#### Q: `FileNotFoundError` for model weights
**A:** Ensure models are in correct folders:
```
Linly-Talker/
├── checkpoints/
│   ├── mapping_00109-model.pth.tar (149MB)
│   ├── mapping_00229-model.pth.tar (149MB)
│   └── ...
├── Musetalk/
│   └── models/
│       ├── musetalk/
│       ├── dwpose/
│       └── ...
└── gfpgan/
    └── weights/
```

#### Q: `SadTalker Error: invalid load key, 'v'`
**A:** Re-download `mapping_*.pth.tar` files (they should be 149MB each):
```bash
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00109-model.pth.tar
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00229-model.pth.tar
```

#### Q: `File is not a zip file` (NLTK error)
**A:** Manually download `nltk_data`:
```python
import nltk
print(nltk.data.path)  # Find cache path
```
Download from [Quark Netdisk](https://pan.quark.cn/s/f48f5e35796b) and place in cache path.

---

### 3. Runtime Issues

#### Q: VRAM overflow / Out of Memory
**A:** 
- **Minimum**: 6GB VRAM (SadTalker only)
- **Recommended**: 11GB+ VRAM (MuseTalk)
- **Solution**: Use lower resolution images or reduce batch size

#### Q: `GFPGANer is not defined`
**A:** Install enhancement module:
```bash
pip install gfpgan
```

#### Q: `Gradio Connection errored out`
**A:** 
- Check firewall settings
- Try different port in `webui.py`:
```python
demo.launch(server_port=7861)  # Change port
```

#### Q: Avatar preparation fails
**A:**
- Use clear frontal face images/videos
- Recommended resolution: 512x512 to 1024x1024
- Supported formats: `.jpg`, `.png`, `.mp4`

---

### 4. Gemini Live Specific Issues

#### Q: WebSocket connection fails
**A:** 
- Verify Railway bridge is running: `wss://gemini-live-bridge-production.up.railway.app/ws`
- Check internet connection
- Ensure no firewall blocking WebSocket connections

#### Q: No audio playback
**A:** 
- Check browser audio permissions
- Verify `speaker_output` component has `autoplay=True`
- Test with different browser (Chrome recommended)

#### Q: Avatar not lip-syncing
**A:**
1. Click "🎭 Prepare Avatar" and wait for "✅ Ready"
2. Click "🔌 Connect to Gemini" and wait for "✅ Connected"
3. Ensure microphone permissions are granted
4. Check audio buffer is receiving data

#### Q: High latency / Lag
**A:**
- **Target**: <1 second end-to-end
- **Optimize**:
  - Use GPU (not CPU)
  - Reduce image resolution
  - Set `return_frame_only=True` in `inference_streaming()` for faster rendering
  - Check network speed to Railway bridge

---

### 5. Usage Tips

#### Q: How to use custom avatar?
**A:**
1. Uncheck "Use Default Avatar"
2. Upload your image/video (frontal face, clear features)
3. Adjust "Mouth Position Fix" slider if needed
4. Click "🎭 Prepare Avatar"

#### Q: How to adjust mouth position?
**A:** Use the "BBox Shift" slider:
- **Positive values** (+): Move mouth down
- **Negative values** (-): Move mouth up
- Default: 5

#### Q: Best practices for demo?
**A:**
1. **Preparation**: Always prepare avatar before connecting
2. **Connection**: Wait for "✅ Connected" status
3. **Speaking**: Speak clearly, natural pace
4. **Interruption**: Gemini 2.5 Flash handles interruptions natively - try it!
5. **Quality**: Use good microphone for best results

---

## Performance Benchmarks

| Component | Latency | VRAM Usage |
|-----------|---------|------------|
| WebSocket (Railway) | ~50ms | 0GB |
| Gemini 2.5 Flash | ~200ms | 0GB (Cloud) |
| MuseTalk Inference | ~40ms/frame | 6-8GB |
| Audio Buffer | ~200ms | <1GB |
| **Total End-to-End** | **~500ms** | **8-11GB** |

---

## System Requirements

### Minimum
- GPU: 6GB VRAM
- RAM: 8GB
- CPU: 4 cores
- Network: 10 Mbps

### Recommended
- GPU: 11GB+ VRAM (RTX 2080 Ti / RTX 3060 or better)
- RAM: 16GB
- CPU: 8 cores
- Network: 50 Mbps

---

## Troubleshooting Checklist

Before reporting issues, verify:

- [ ] Python 3.10 installed
- [ ] CUDA 11.8 installed (for GPU)
- [ ] All model weights downloaded (check file sizes)
- [ ] Models in correct folder structure
- [ ] Dependencies installed (`requirements_webui.txt`)
- [ ] FFMPEG installed
- [ ] Sufficient VRAM available
- [ ] Railway bridge is accessible
- [ ] Firewall allows WebSocket connections
- [ ] Browser has microphone permissions

---

## Getting Help

1. **Check this FAQ first**
2. **Review error messages** - most include hints
3. **Check model file sizes** - incomplete downloads are common
4. **Try with default avatar** - isolates custom image issues
5. **Report issues** with:
   - Full error message
   - Python version
   - GPU model
   - Steps to reproduce

---

## Links

- **GitHub**: [Kedreamix/Linly-Talker](https://github.com/Kedreamix/Linly-Talker)
- **Models**: [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker) | [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)
- **Railway Bridge**: [gemini-live-bridge](https://gemini-live-bridge-production.up.railway.app)

---

**Last Updated**: February 2026  
**Version**: Gemini Live Integration v1.0