agkavin
/

Avatar-Speech

Model card Files Files and versions

Avatar-Speech / setup /setup.md

agkavin

Reorganize setup files and update documentation

21328a8 13 days ago

|

history blame contribute delete

3.36 kB

	# Speech-X Setup Guide

	Step-by-step install for the `avatar` conda environment. Adapted from the MuseTalk official docs for Python 3.12.

	Or run the automated script from the repo root:
	```bash
	bash setup/setup.sh # Linux / macOS
	.\setup\setup.ps1 # Windows (PowerShell)
	```

	## Stage 1: Create Environment
	```bash
	conda create -n avatar python=3.12
	conda activate avatar
	```

	## Stage 2: Install PyTorch
	# Python 3.12 requires PyTorch 2.5+ (2.0.1 doesn't support 3.12)
	```bash
	pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
	```

	## Stage 3: Install MMLab Packages
	# These are critical for MuseTalk - install one by one
	```bash
	pip install --no-cache-dir -U openmim
	mim install mmengine
	mim install mmcv==2.2.0
	# If above fails, try: pip install mmcv-lite==2.2.0
	mim install mmdet==3.3.0
	# Note: mmpose not required — not present in the reference env
	```

	## Stage 4: Install Musetalk Dependencies
	# These are from musetalk's official requirements.txt
	```bash
	pip install diffusers==0.30.2
	pip install accelerate==0.28.0
	pip install numpy==2.4.2
	pip install opencv-python==4.13.0.92
	pip install soundfile==0.12.1
	pip install transformers==4.39.2
	pip install huggingface-hub==0.36.2
	pip install librosa==0.10.2
	pip install einops==0.8.1

	pip install gdown
	pip install requests
	pip install imageio==2.34.0
	pip install imageio-ffmpeg

	pip install omegaconf==2.3.0
	pip install ffmpeg-python
	pip install moviepy
	```

	## Stage 5: Install Additional Dependencies
	# For this speech-to-video project
	```bash
	pip install fastapi>=0.115.0
	pip install uvicorn[standard]>=0.30.0
	pip install pydantic>=2.10.0
	pip install python-dotenv>=1.0.1
	pip install livekit>=0.10.0
	pip install livekit-agents>=0.8.0
	pip install kokoro-onnx>=0.5.0
	pip install scipy>=1.13.0
	pip install faster-whisper>=1.0.0
	pip install sse-starlette>=2.0.0
	pip install onnxruntime>=1.24.0
	pip install sounddevice>=0.5.0
	pip install tqdm>=4.65.0
	pip install pyyaml>=6.0.0
	pip install aiohttp>=3.9.0
	pip install httpx>=0.27.0
	pip install safetensors>=0.4.0
	pip install pillow>=10.0.0
	```

	## Quick Test
	```bash
	# Test MuseTalk imports
	python -c "
	import sys
	sys.path.insert(0, 'backend')
	from musetalk.processor import *
	print('MuseTalk OK')
	"

	# Test TTS import
	python -c "
	import sys
	sys.path.insert(0, 'backend')
	from tts.kokoro_tts import KokoroTTS
	print('KokoroTTS OK')
	"
	```

	## Avatar Creation

	Run once per avatar before starting the server. Script reads from `backend/config.py`
	for model paths and writes assets to `backend/avatars/<name>/`.

	Single portrait image:
	```bash
	conda activate avatar
	python setup/avatar_creation.py --image frontend/public/Sophy.png --name sophy
	```

	Talking-head video:
	```bash
	python setup/avatar_creation.py --video /path/to/talking_head.mp4 --name harry_1
	```

	Batch (multiple avatars at once):
	```bash
	# Edit setup/avatars_config.yml first, then:
	python setup/avatar_creation.py --config setup/avatars_config.yml
	```

	Options:
	\| Flag \| Default \| Description \|
	\|------\|---------\|-------------\|
	\| `--name` \| required \| Avatar folder name \|
	\| `--frames` \| `50` \| Frame count for `--image` mode \|
	\| `--bbox-shift` \| `5` \| Vertical bbox nudge (tune if face crop is off) \|
	\| `--device` \| `cuda` \| `cuda` or `cpu` \|
	\| `--overwrite` \| off \| Skip re-create prompt \|