File size: 3,358 Bytes
21328a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# Speech-X Setup Guide

Step-by-step install for the `avatar` conda environment. Adapted from the MuseTalk official docs for Python 3.12.

Or run the automated script from the repo root:
```bash
bash setup/setup.sh          # Linux / macOS
.\setup\setup.ps1            # Windows (PowerShell)
```

## Stage 1: Create Environment
```bash
conda create -n avatar python=3.12
conda activate avatar
```

## Stage 2: Install PyTorch
# Python 3.12 requires PyTorch 2.5+ (2.0.1 doesn't support 3.12)
```bash
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
```

## Stage 3: Install MMLab Packages
# These are critical for MuseTalk - install one by one
```bash
pip install --no-cache-dir -U openmim
mim install mmengine
mim install mmcv==2.2.0
# If above fails, try: pip install mmcv-lite==2.2.0
mim install mmdet==3.3.0
# Note: mmpose not required — not present in the reference env
```

## Stage 4: Install Musetalk Dependencies
# These are from musetalk's official requirements.txt
```bash
pip install diffusers==0.30.2
pip install accelerate==0.28.0
pip install numpy==2.4.2
pip install opencv-python==4.13.0.92
pip install soundfile==0.12.1
pip install transformers==4.39.2
pip install huggingface-hub==0.36.2
pip install librosa==0.10.2
pip install einops==0.8.1

pip install gdown
pip install requests
pip install imageio==2.34.0
pip install imageio-ffmpeg

pip install omegaconf==2.3.0
pip install ffmpeg-python
pip install moviepy
```

## Stage 5: Install Additional Dependencies
# For this speech-to-video project
```bash
pip install fastapi>=0.115.0
pip install uvicorn[standard]>=0.30.0
pip install pydantic>=2.10.0
pip install python-dotenv>=1.0.1
pip install livekit>=0.10.0
pip install livekit-agents>=0.8.0
pip install kokoro-onnx>=0.5.0
pip install scipy>=1.13.0
pip install faster-whisper>=1.0.0
pip install sse-starlette>=2.0.0
pip install onnxruntime>=1.24.0
pip install sounddevice>=0.5.0
pip install tqdm>=4.65.0
pip install pyyaml>=6.0.0
pip install aiohttp>=3.9.0
pip install httpx>=0.27.0
pip install safetensors>=0.4.0
pip install pillow>=10.0.0
```

## Quick Test
```bash
# Test MuseTalk imports
python -c "
import sys
sys.path.insert(0, 'backend')
from musetalk.processor import *
print('MuseTalk OK')
"

# Test TTS import
python -c "
import sys
sys.path.insert(0, 'backend')
from tts.kokoro_tts import KokoroTTS
print('KokoroTTS OK')
"
```

## Avatar Creation

Run once per avatar before starting the server. Script reads from `backend/config.py`
for model paths and writes assets to `backend/avatars/<name>/`.

**Single portrait image:**
```bash
conda activate avatar
python setup/avatar_creation.py --image frontend/public/Sophy.png --name sophy
```

**Talking-head video:**
```bash
python setup/avatar_creation.py --video /path/to/talking_head.mp4 --name harry_1
```

**Batch (multiple avatars at once):**
```bash
# Edit setup/avatars_config.yml first, then:
python setup/avatar_creation.py --config setup/avatars_config.yml
```

Options:
| Flag | Default | Description |
|------|---------|-------------|
| `--name` | required | Avatar folder name |
| `--frames` | `50` | Frame count for `--image` mode |
| `--bbox-shift` | `5` | Vertical bbox nudge (tune if face crop is off) |
| `--device` | `cuda` | `cuda` or `cpu` |
| `--overwrite` | off | Skip re-create prompt |