personaxgemini

Paused

App Files Files Community

personaxgemini / FAQ.md

eshwar06

Upload 29 files

229897d verified 11 days ago

preview code

raw

history blame contribute delete

7.37 kB

	# Gemini Live Avatar - FAQ

	## Quick Start Guide

	### Prerequisites
	- GPU: NVIDIA GPU with 11GB+ VRAM (recommended)
	- Python: 3.10
	- CUDA: 11.8
	- OS: Windows/Linux

	### Installation

	1. Clone Repository
	```bash
	git clone https://github.com/Kedreamix/Linly-Talker.git
	cd Linly-Talker
	```

	2. Create Environment
	```bash
	conda create -n linly python=3.10
	conda activate linly
	```

	3. Install PyTorch
	```bash
	# CUDA 11.8
	pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
	```

	4. Install Dependencies
	```bash
	conda install -q ffmpeg
	pip install -r requirements_webui.txt

	# MuseTalk dependencies
	pip install --no-cache-dir -U openmim
	mim install mmengine
	mim install "mmcv>=2.0.1"
	mim install "mmdet>=3.1.0"
	mim install "mmpose>=1.1.0"
	```

	5. Download Models

	Download the required models from one of these sources:
	- [Baidu Netdisk](https://pan.baidu.com/s/1eF13O-8wyw4B3MtesctQyg?pwd=linl) (Password: linl)
	- [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker)
	- [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)

	Required Models:
	- MuseTalk models → `Musetalk/models/`
	- SadTalker checkpoints → `checkpoints/`
	- Face detection models → `gfpgan/weights/`

	6. Launch
	```bash
	python webui.py
	```

	Open `http://localhost:7860` in your browser.

	---

	## Common Issues

	### 1. Installation Issues

	#### Q: `Microsoft Visual C++ 14.0 is required`
	A: Install [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)

	#### Q: `version GLIBCXX_3.4.* not found`
	A: Use Python 3.10 or downgrade libraries:
	```bash
	pip install pyopenjtalk==0.3.1
	pip install opencc==1.1.1
	```

	#### Q: FFMPEG not found
	A: Install via conda:
	```bash
	conda install -q ffmpeg
	```

	Or on Linux:
	```bash
	sudo apt install ffmpeg
	```

	---

	### 2. Model & Weight Issues

	#### Q: `FileNotFoundError` for model weights
	A: Ensure models are in correct folders:
	```
	Linly-Talker/
	├── checkpoints/
	│ ├── mapping_00109-model.pth.tar (149MB)
	│ ├── mapping_00229-model.pth.tar (149MB)
	│ └── ...
	├── Musetalk/
	│ └── models/
	│ ├── musetalk/
	│ ├── dwpose/
	│ └── ...
	└── gfpgan/
	└── weights/
	```

	#### Q: `SadTalker Error: invalid load key, 'v'`
	A: Re-download `mapping_*.pth.tar` files (they should be 149MB each):
	```bash
	wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00109-model.pth.tar
	wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00229-model.pth.tar
	```

	#### Q: `File is not a zip file` (NLTK error)
	A: Manually download `nltk_data`:
	```python
	import nltk
	print(nltk.data.path) # Find cache path
	```
	Download from [Quark Netdisk](https://pan.quark.cn/s/f48f5e35796b) and place in cache path.

	---

	### 3. Runtime Issues

	#### Q: VRAM overflow / Out of Memory
	A:
	- Minimum: 6GB VRAM (SadTalker only)
	- Recommended: 11GB+ VRAM (MuseTalk)
	- Solution: Use lower resolution images or reduce batch size

	#### Q: `GFPGANer is not defined`
	A: Install enhancement module:
	```bash
	pip install gfpgan
	```

	#### Q: `Gradio Connection errored out`
	A:
	- Check firewall settings
	- Try different port in `webui.py`:
	```python
	demo.launch(server_port=7861) # Change port
	```

	#### Q: Avatar preparation fails
	A:
	- Use clear frontal face images/videos
	- Recommended resolution: 512x512 to 1024x1024
	- Supported formats: `.jpg`, `.png`, `.mp4`

	---

	### 4. Gemini Live Specific Issues

	#### Q: WebSocket connection fails
	A:
	- Verify Railway bridge is running: `wss://gemini-live-bridge-production.up.railway.app/ws`
	- Check internet connection
	- Ensure no firewall blocking WebSocket connections

	#### Q: No audio playback
	A:
	- Check browser audio permissions
	- Verify `speaker_output` component has `autoplay=True`
	- Test with different browser (Chrome recommended)

	#### Q: Avatar not lip-syncing
	A:
	1. Click "🎭 Prepare Avatar" and wait for "✅ Ready"
	2. Click "🔌 Connect to Gemini" and wait for "✅ Connected"
	3. Ensure microphone permissions are granted
	4. Check audio buffer is receiving data

	#### Q: High latency / Lag
	A:
	- Target: <1 second end-to-end
	- Optimize:
	- Use GPU (not CPU)
	- Reduce image resolution
	- Set `return_frame_only=True` in `inference_streaming()` for faster rendering
	- Check network speed to Railway bridge

	---

	### 5. Usage Tips

	#### Q: How to use custom avatar?
	A:
	1. Uncheck "Use Default Avatar"
	2. Upload your image/video (frontal face, clear features)
	3. Adjust "Mouth Position Fix" slider if needed
	4. Click "🎭 Prepare Avatar"

	#### Q: How to adjust mouth position?
	A: Use the "BBox Shift" slider:
	- Positive values (+): Move mouth down
	- Negative values (-): Move mouth up
	- Default: 5

	#### Q: Best practices for demo?
	A:
	1. Preparation: Always prepare avatar before connecting
	2. Connection: Wait for "✅ Connected" status
	3. Speaking: Speak clearly, natural pace
	4. Interruption: Gemini 2.5 Flash handles interruptions natively - try it!
	5. Quality: Use good microphone for best results

	---

	## Performance Benchmarks

	\| Component \| Latency \| VRAM Usage \|
	\|-----------\|---------\|------------\|
	\| WebSocket (Railway) \| ~50ms \| 0GB \|
	\| Gemini 2.5 Flash \| ~200ms \| 0GB (Cloud) \|
	\| MuseTalk Inference \| ~40ms/frame \| 6-8GB \|
	\| Audio Buffer \| ~200ms \| <1GB \|
	\| Total End-to-End \| ~500ms \| 8-11GB \|

	---

	## System Requirements

	### Minimum
	- GPU: 6GB VRAM
	- RAM: 8GB
	- CPU: 4 cores
	- Network: 10 Mbps

	### Recommended
	- GPU: 11GB+ VRAM (RTX 2080 Ti / RTX 3060 or better)
	- RAM: 16GB
	- CPU: 8 cores
	- Network: 50 Mbps

	---

	## Troubleshooting Checklist

	Before reporting issues, verify:

	- [ ] Python 3.10 installed
	- [ ] CUDA 11.8 installed (for GPU)
	- [ ] All model weights downloaded (check file sizes)
	- [ ] Models in correct folder structure
	- [ ] Dependencies installed (`requirements_webui.txt`)
	- [ ] FFMPEG installed
	- [ ] Sufficient VRAM available
	- [ ] Railway bridge is accessible
	- [ ] Firewall allows WebSocket connections
	- [ ] Browser has microphone permissions

	---

	## Getting Help

	1. Check this FAQ first
	2. Review error messages - most include hints
	3. Check model file sizes - incomplete downloads are common
	4. Try with default avatar - isolates custom image issues
	5. Report issues with:
	- Full error message
	- Python version
	- GPU model
	- Steps to reproduce

	---

	## Links

	- GitHub: [Kedreamix/Linly-Talker](https://github.com/Kedreamix/Linly-Talker)
	- Models: [HuggingFace](https://huggingface.co/Kedreamix/Linly-Talker) \| [ModelScope](https://modelscope.cn/models/Kedreamix/Linly-Talker)
	- Railway Bridge: [gemini-live-bridge](https://gemini-live-bridge-production.up.railway.app)

	---

	Last Updated: February 2026
	Version: Gemini Live Integration v1.0