voiceagents-cosyvoice2 / SETUP_GUIDE.md
starkprince's picture
Upload folder using huggingface_hub
778d4b8 verified
# LLaMA-Omni2 Voice Assistant Setup Guide
This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration.
## Prerequisites
- Ubuntu/Linux system with CUDA-capable GPU
- CUDA 12.1 or higher installed
- Miniconda or Anaconda installed
- At least 16GB RAM and 20GB free disk space
- Python 3.10
## Environment Setup Options
### Option 1: Using Conda Environment File (Recommended)
```bash
# Create environment from comprehensive yml file
conda env create -f environment-comprehensive.yml
# Activate the environment
conda activate gsva-python310
```
### Option 2: Using Frozen Requirements
```bash
# Create a new conda environment
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310
# Install from frozen requirements
pip install -r requirements-frozen-new.txt
```
### Option 3: Manual Setup Using Script
```bash
# Run the complete setup script
bash script.sh
```
## Detailed Manual Setup
### 1. Create and Activate Conda Environment
```bash
source /home/azureuser/miniconda3/etc/profile.d/conda.sh
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310
```
### 2. Install Basic Dependencies
```bash
pip install Cython numpy==1.26.4
pip install packaging wheel setuptools==69.5.1
```
### 3. Install the Package
```bash
# Install in development mode
pip install -e .
```
### 4. Install Core Dependencies
```bash
# Essential packages
pip install huggingface_hub==0.25.1
pip install uvicorn openai-whisper fastapi
pip install hf_transfer ninja
# Gradio for web interface
pip install gradio==5.3.0 gradio_client==1.4.2
```
### 5. Setup CUDA Environment
```bash
# Link CUDA installation
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```
### 6. Install PyTorch with CUDA Support
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
### 7. Install Flash Attention
```bash
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```
### 8. Install Transformers and Audio Libraries
```bash
# Specific version for LLaMA-Omni2 compatibility
pip install transformers==4.43.4
# Audio processing libraries
pip install matcha-tts --no-build-isolation
pip install git+https://github.com/FunAudioLLM/CosyVoice.git
# Additional dependencies
pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml
```
## Model Downloads
### 1. Download LLaMA-Omni2 Model
```bash
mkdir -p models
huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B
```
### 2. Download CosyVoice2 Model
```bash
mkdir -p models/cosyvoice2
python -c "
from huggingface_hub import snapshot_download
import os
os.makedirs('models/cosyvoice2', exist_ok=True)
snapshot_download(
repo_id='FunAudioLLM/CosyVoice2-0.5B',
local_dir='models/cosyvoice2',
local_dir_use_symlinks=False
)
"
```
### 3. Fix CosyVoice Configuration
```bash
# Create backup
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup
# Copy to expected filename
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml
# Remove problematic parameter
grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml
mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml
```
## Running the Services
### 1. Start Controller
```bash
nohup python -m llama_omni2.serve.controller \
--host 0.0.0.0 \
--port 10000 > controller.log 2>&1 &
```
### 2. Start Model Worker
```bash
nohup python -m llama_omni2.serve.model_worker \
--host 0.0.0.0 \
--controller http://localhost:10000 \
--port 40000 \
--worker http://localhost:40000 \
--model-path models/LLaMA-Omni2-3B \
--model-name LLaMA-Omni2-3B > worker.log 2>&1 &
```
### 3. Start Gradio Web Server
With CosyVoice2 vocoder:
```bash
python -m llama_omni2.serve.gradio_web_server \
--controller http://localhost:10000 \
--port 8000 \
--vocoder-dir models/cosyvoice2
```
Without vocoder (fallback):
```bash
python -m llama_omni2.serve.gradio_web_server \
--controller http://localhost:10000 \
--port 8000
```
## Monitoring Services
```bash
# Check controller logs
tail -f controller.log
# Check model worker logs
tail -f worker.log
# Access web UI
# Open browser at http://localhost:8000
```
## Troubleshooting
### Common Issues
1. **CUDA not found**: Ensure CUDA paths are exported correctly
2. **Flash attention build fails**: Use `MAX_JOBS=4` to limit parallel compilation
3. **CosyVoice mix_ratio error**: Follow the configuration fix steps above
4. **Port already in use**: Kill existing processes or use different ports
### Killing Services
```bash
# Find and kill Python processes
ps aux | grep python | grep -E "(controller|model_worker|gradio_web_server)" | awk '{print $2}' | xargs -r kill
```
## Project Structure
```
voiceagents/
β”œβ”€β”€ llama_omni2/ # Main application code
β”œβ”€β”€ cosyvoice/ # CosyVoice integration
β”œβ”€β”€ models/ # Downloaded models
β”‚ β”œβ”€β”€ LLaMA-Omni2-3B/
β”‚ └── cosyvoice2/
β”œβ”€β”€ examples/ # Sample audio files
β”œβ”€β”€ script.sh # Setup script
β”œβ”€β”€ pyproject.toml # Project configuration
β”œβ”€β”€ requirements-frozen-new.txt # Frozen dependencies
β”œβ”€β”€ environment-comprehensive.yml # Conda environment
└── SETUP_GUIDE.md # This file
```
## Environment Variables
Set these in your `.bashrc` or `.zshrc`:
```bash
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_HOME=~/.cache/huggingface
export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0"
export MAX_JOBS=4
```
## Version Information
- Python: 3.10
- PyTorch: 2.3.1
- Transformers: 4.43.4
- Gradio: 5.3.0
- CUDA: 12.1+
- CosyVoice2: 0.5B model
## Additional Notes
- The setup has been tested on Ubuntu with NVIDIA GPUs
- Ensure sufficient GPU memory (8GB+ recommended)
- For production deployment, consider using systemd services
- Regular backups of models and configurations are recommended
## Support
For issues or questions:
- Check the logs in controller.log, worker.log
- Ensure all dependencies are correctly installed
- Verify CUDA is properly configured
- Review the COSYVOICE2_CHANGES.md for model-specific details