# LLaMA-Omni2 Voice Assistant Setup Guide

This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration.

## Prerequisites

- Ubuntu/Linux system with CUDA-capable GPU
- CUDA 12.1 or higher installed
- Miniconda or Anaconda installed
- At least 16GB RAM and 20GB free disk space
- Python 3.10

## Environment Setup Options

### Option 1: Using Conda Environment File (Recommended)

```bash
# Create environment from comprehensive yml file
conda env create -f environment-comprehensive.yml

# Activate the environment
conda activate gsva-python310
```

### Option 2: Using Frozen Requirements

```bash
# Create a new conda environment
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310

# Install from frozen requirements
pip install -r requirements-frozen-new.txt
```

### Option 3: Manual Setup Using Script

```bash
# Run the complete setup script
bash script.sh
```

## Detailed Manual Setup

### 1. Create and Activate Conda Environment

```bash
source /home/azureuser/miniconda3/etc/profile.d/conda.sh
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310
```

### 2. Install Basic Dependencies

```bash
pip install Cython numpy==1.26.4
pip install packaging wheel setuptools==69.5.1
```

### 3. Install the Package

```bash
# Install in development mode
pip install -e .
```

### 4. Install Core Dependencies

```bash
# Essential packages
pip install huggingface_hub==0.25.1
pip install uvicorn openai-whisper fastapi
pip install hf_transfer ninja

# Gradio for web interface
pip install gradio==5.3.0 gradio_client==1.4.2
```

### 5. Setup CUDA Environment

```bash
# Link CUDA installation
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```

### 6. Install PyTorch with CUDA Support

```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

### 7. Install Flash Attention

```bash
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```

### 8. Install Transformers and Audio Libraries

```bash
# Specific version for LLaMA-Omni2 compatibility
pip install transformers==4.43.4

# Audio processing libraries
pip install matcha-tts --no-build-isolation
pip install git+https://github.com/FunAudioLLM/CosyVoice.git

# Additional dependencies
pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml
```

## Model Downloads

### 1. Download LLaMA-Omni2 Model

```bash
mkdir -p models
huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B
```

### 2. Download CosyVoice2 Model

```bash
mkdir -p models/cosyvoice2
python -c "
from huggingface_hub import snapshot_download
import os
os.makedirs('models/cosyvoice2', exist_ok=True)
snapshot_download(
    repo_id='FunAudioLLM/CosyVoice2-0.5B',
    local_dir='models/cosyvoice2',
    local_dir_use_symlinks=False
)
"
```

### 3. Fix CosyVoice Configuration

```bash
# Create backup
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup

# Copy to expected filename
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml

# Remove problematic parameter
grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml
mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml
```

## Running the Services

### 1. Start Controller

```bash
nohup python -m llama_omni2.serve.controller \
    --host 0.0.0.0 \
    --port 10000 > controller.log 2>&1 &
```

### 2. Start Model Worker

```bash
nohup python -m llama_omni2.serve.model_worker \
    --host 0.0.0.0 \
    --controller http://localhost:10000 \
    --port 40000 \
    --worker http://localhost:40000 \
    --model-path models/LLaMA-Omni2-3B \
    --model-name LLaMA-Omni2-3B > worker.log 2>&1 &
```

### 3. Start Gradio Web Server

With CosyVoice2 vocoder:
```bash
python -m llama_omni2.serve.gradio_web_server \
    --controller http://localhost:10000 \
    --port 8000 \
    --vocoder-dir models/cosyvoice2
```

Without vocoder (fallback):
```bash
python -m llama_omni2.serve.gradio_web_server \
    --controller http://localhost:10000 \
    --port 8000
```

## Monitoring Services

```bash
# Check controller logs
tail -f controller.log

# Check model worker logs
tail -f worker.log

# Access web UI
# Open browser at http://localhost:8000
```

## Troubleshooting

### Common Issues

1. **CUDA not found**: Ensure CUDA paths are exported correctly
2. **Flash attention build fails**: Use `MAX_JOBS=4` to limit parallel compilation
3. **CosyVoice mix_ratio error**: Follow the configuration fix steps above
4. **Port already in use**: Kill existing processes or use different ports

### Killing Services

```bash
# Find and kill Python processes
ps aux | grep python | grep -E "(controller|model_worker|gradio_web_server)" | awk '{print $2}' | xargs -r kill
```

## Project Structure

```
voiceagents/
├── llama_omni2/          # Main application code
├── cosyvoice/            # CosyVoice integration
├── models/               # Downloaded models
│   ├── LLaMA-Omni2-3B/
│   └── cosyvoice2/
├── examples/             # Sample audio files
├── script.sh             # Setup script
├── pyproject.toml        # Project configuration
├── requirements-frozen-new.txt  # Frozen dependencies
├── environment-comprehensive.yml # Conda environment
└── SETUP_GUIDE.md        # This file
```

## Environment Variables

Set these in your `.bashrc` or `.zshrc`:

```bash
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_HOME=~/.cache/huggingface
export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0"
export MAX_JOBS=4
```

## Version Information

- Python: 3.10
- PyTorch: 2.3.1
- Transformers: 4.43.4
- Gradio: 5.3.0
- CUDA: 12.1+
- CosyVoice2: 0.5B model

## Additional Notes

- The setup has been tested on Ubuntu with NVIDIA GPUs
- Ensure sufficient GPU memory (8GB+ recommended)
- For production deployment, consider using systemd services
- Regular backups of models and configurations are recommended

## Support

For issues or questions:
- Check the logs in controller.log, worker.log
- Ensure all dependencies are correctly installed
- Verify CUDA is properly configured
- Review the COSYVOICE2_CHANGES.md for model-specific details