Spaces:

eshwar06
/

personaxgemini

Runtime error

App Files Files Community

personaxgemini / FAQ.md

eshwar06

Upload 29 files

229897d verified 10 days ago

preview code

raw

history blame contribute delete

7.37 kB

Gemini Live Avatar - FAQ

Quick Start Guide

Prerequisites

GPU: NVIDIA GPU with 11GB+ VRAM (recommended)
Python: 3.10
CUDA: 11.8
OS: Windows/Linux

Installation

Clone Repository

git clone https://github.com/Kedreamix/Linly-Talker.git
cd Linly-Talker

Create Environment

conda create -n linly python=3.10
conda activate linly

Install PyTorch

# CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install Dependencies

conda install -q ffmpeg
pip install -r requirements_webui.txt

# MuseTalk dependencies
pip install --no-cache-dir -U openmim
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0"

Download Models

Download the required models from one of these sources:

Required Models:

MuseTalk models → Musetalk/models/
SadTalker checkpoints → checkpoints/
Face detection models → gfpgan/weights/

Launch

python webui.py

Open http://localhost:7860 in your browser.

Common Issues

1. Installation Issues

Q: `Microsoft Visual C++ 14.0 is required`

A: Install Microsoft C++ Build Tools

Q: `version GLIBCXX_3.4.* not found`

A: Use Python 3.10 or downgrade libraries:

pip install pyopenjtalk==0.3.1
pip install opencc==1.1.1

Q: FFMPEG not found

A: Install via conda:

conda install -q ffmpeg

Or on Linux:

sudo apt install ffmpeg

2. Model & Weight Issues

Q: `FileNotFoundError` for model weights

A: Ensure models are in correct folders:

Linly-Talker/
├── checkpoints/
│   ├── mapping_00109-model.pth.tar (149MB)
│   ├── mapping_00229-model.pth.tar (149MB)
│   └── ...
├── Musetalk/
│   └── models/
│       ├── musetalk/
│       ├── dwpose/
│       └── ...
└── gfpgan/
    └── weights/

Q: `SadTalker Error: invalid load key, 'v'`

A: Re-download mapping_*.pth.tar files (they should be 149MB each):

wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00109-model.pth.tar
wget -c https://modelscope.cn/api/v1/models/Kedreamix/Linly-Talker/repo?Revision=master&FilePath=checkpoints%2Fmapping_00229-model.pth.tar

Q: `File is not a zip file` (NLTK error)

A: Manually download nltk_data:

import nltk
print(nltk.data.path)  # Find cache path

Download from Quark Netdisk and place in cache path.

3. Runtime Issues

Q: VRAM overflow / Out of Memory

Minimum: 6GB VRAM (SadTalker only)
Recommended: 11GB+ VRAM (MuseTalk)
Solution: Use lower resolution images or reduce batch size

Q: `GFPGANer is not defined`

A: Install enhancement module:

pip install gfpgan

Q: `Gradio Connection errored out`

Check firewall settings
Try different port in webui.py:

demo.launch(server_port=7861)  # Change port

Q: Avatar preparation fails

Use clear frontal face images/videos
Recommended resolution: 512x512 to 1024x1024
Supported formats: .jpg, .png, .mp4

4. Gemini Live Specific Issues

Q: WebSocket connection fails

Verify Railway bridge is running: wss://gemini-live-bridge-production.up.railway.app/ws
Check internet connection
Ensure no firewall blocking WebSocket connections

Q: No audio playback

Check browser audio permissions
Verify speaker_output component has autoplay=True
Test with different browser (Chrome recommended)

Q: Avatar not lip-syncing

Click "🎭 Prepare Avatar" and wait for "✅ Ready"
Click "🔌 Connect to Gemini" and wait for "✅ Connected"
Ensure microphone permissions are granted
Check audio buffer is receiving data

Q: High latency / Lag

Target: <1 second end-to-end
Optimize:
- Use GPU (not CPU)
- Reduce image resolution
- Set return_frame_only=True in inference_streaming() for faster rendering
- Check network speed to Railway bridge

5. Usage Tips

Q: How to use custom avatar?

Uncheck "Use Default Avatar"
Upload your image/video (frontal face, clear features)
Adjust "Mouth Position Fix" slider if needed
Click "🎭 Prepare Avatar"

Q: How to adjust mouth position?

A: Use the "BBox Shift" slider:

Positive values (+): Move mouth down
Negative values (-): Move mouth up
Default: 5

Q: Best practices for demo?

Preparation: Always prepare avatar before connecting
Connection: Wait for "✅ Connected" status
Speaking: Speak clearly, natural pace
Interruption: Gemini 2.5 Flash handles interruptions natively - try it!
Quality: Use good microphone for best results

Performance Benchmarks

Component	Latency	VRAM Usage
WebSocket (Railway)	~50ms	0GB
Gemini 2.5 Flash	~200ms	0GB (Cloud)
MuseTalk Inference	~40ms/frame	6-8GB
Audio Buffer	~200ms	<1GB
Total End-to-End	~500ms	8-11GB

System Requirements

Minimum

GPU: 6GB VRAM
RAM: 8GB
CPU: 4 cores
Network: 10 Mbps

Troubleshooting Checklist

Before reporting issues, verify:

Python 3.10 installed
CUDA 11.8 installed (for GPU)
All model weights downloaded (check file sizes)
Models in correct folder structure
Dependencies installed (requirements_webui.txt)
FFMPEG installed
Sufficient VRAM available
Railway bridge is accessible
Firewall allows WebSocket connections
Browser has microphone permissions

Getting Help

Check this FAQ first
Review error messages - most include hints
Check model file sizes - incomplete downloads are common
Try with default avatar - isolates custom image issues
Report issues with:
- Full error message
- Python version
- GPU model
- Steps to reproduce