# LLaMA-Omni2 Voice Assistant Setup Guide This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration. ## Prerequisites - Ubuntu/Linux system with CUDA-capable GPU - CUDA 12.1 or higher installed - Miniconda or Anaconda installed - At least 16GB RAM and 20GB free disk space - Python 3.10 ## Environment Setup Options ### Option 1: Using Conda Environment File (Recommended) ```bash # Create environment from comprehensive yml file conda env create -f environment-comprehensive.yml # Activate the environment conda activate gsva-python310 ``` ### Option 2: Using Frozen Requirements ```bash # Create a new conda environment conda create -n gsva-python310 python=3.10 -y conda activate gsva-python310 # Install from frozen requirements pip install -r requirements-frozen-new.txt ``` ### Option 3: Manual Setup Using Script ```bash # Run the complete setup script bash script.sh ``` ## Detailed Manual Setup ### 1. Create and Activate Conda Environment ```bash source /home/azureuser/miniconda3/etc/profile.d/conda.sh conda create -n gsva-python310 python=3.10 -y conda activate gsva-python310 ``` ### 2. Install Basic Dependencies ```bash pip install Cython numpy==1.26.4 pip install packaging wheel setuptools==69.5.1 ``` ### 3. Install the Package ```bash # Install in development mode pip install -e . ``` ### 4. Install Core Dependencies ```bash # Essential packages pip install huggingface_hub==0.25.1 pip install uvicorn openai-whisper fastapi pip install hf_transfer ninja # Gradio for web interface pip install gradio==5.3.0 gradio_client==1.4.2 ``` ### 5. Setup CUDA Environment ```bash # Link CUDA installation sudo rm -rf /usr/local/cuda sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ``` ### 6. Install PyTorch with CUDA Support ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ``` ### 7. Install Flash Attention ```bash MAX_JOBS=4 pip install flash-attn --no-build-isolation ``` ### 8. Install Transformers and Audio Libraries ```bash # Specific version for LLaMA-Omni2 compatibility pip install transformers==4.43.4 # Audio processing libraries pip install matcha-tts --no-build-isolation pip install git+https://github.com/FunAudioLLM/CosyVoice.git # Additional dependencies pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml ``` ## Model Downloads ### 1. Download LLaMA-Omni2 Model ```bash mkdir -p models huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B ``` ### 2. Download CosyVoice2 Model ```bash mkdir -p models/cosyvoice2 python -c " from huggingface_hub import snapshot_download import os os.makedirs('models/cosyvoice2', exist_ok=True) snapshot_download( repo_id='FunAudioLLM/CosyVoice2-0.5B', local_dir='models/cosyvoice2', local_dir_use_symlinks=False ) " ``` ### 3. Fix CosyVoice Configuration ```bash # Create backup cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup # Copy to expected filename cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml # Remove problematic parameter grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml ``` ## Running the Services ### 1. Start Controller ```bash nohup python -m llama_omni2.serve.controller \ --host 0.0.0.0 \ --port 10000 > controller.log 2>&1 & ``` ### 2. Start Model Worker ```bash nohup python -m llama_omni2.serve.model_worker \ --host 0.0.0.0 \ --controller http://localhost:10000 \ --port 40000 \ --worker http://localhost:40000 \ --model-path models/LLaMA-Omni2-3B \ --model-name LLaMA-Omni2-3B > worker.log 2>&1 & ``` ### 3. Start Gradio Web Server With CosyVoice2 vocoder: ```bash python -m llama_omni2.serve.gradio_web_server \ --controller http://localhost:10000 \ --port 8000 \ --vocoder-dir models/cosyvoice2 ``` Without vocoder (fallback): ```bash python -m llama_omni2.serve.gradio_web_server \ --controller http://localhost:10000 \ --port 8000 ``` ## Monitoring Services ```bash # Check controller logs tail -f controller.log # Check model worker logs tail -f worker.log # Access web UI # Open browser at http://localhost:8000 ``` ## Troubleshooting ### Common Issues 1. **CUDA not found**: Ensure CUDA paths are exported correctly 2. **Flash attention build fails**: Use `MAX_JOBS=4` to limit parallel compilation 3. **CosyVoice mix_ratio error**: Follow the configuration fix steps above 4. **Port already in use**: Kill existing processes or use different ports ### Killing Services ```bash # Find and kill Python processes ps aux | grep python | grep -E "(controller|model_worker|gradio_web_server)" | awk '{print $2}' | xargs -r kill ``` ## Project Structure ``` voiceagents/ ├── llama_omni2/ # Main application code ├── cosyvoice/ # CosyVoice integration ├── models/ # Downloaded models │ ├── LLaMA-Omni2-3B/ │ └── cosyvoice2/ ├── examples/ # Sample audio files ├── script.sh # Setup script ├── pyproject.toml # Project configuration ├── requirements-frozen-new.txt # Frozen dependencies ├── environment-comprehensive.yml # Conda environment └── SETUP_GUIDE.md # This file ``` ## Environment Variables Set these in your `.bashrc` or `.zshrc`: ```bash export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export HF_HUB_ENABLE_HF_TRANSFER=1 export HF_HOME=~/.cache/huggingface export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0" export MAX_JOBS=4 ``` ## Version Information - Python: 3.10 - PyTorch: 2.3.1 - Transformers: 4.43.4 - Gradio: 5.3.0 - CUDA: 12.1+ - CosyVoice2: 0.5B model ## Additional Notes - The setup has been tested on Ubuntu with NVIDIA GPUs - Ensure sufficient GPU memory (8GB+ recommended) - For production deployment, consider using systemd services - Regular backups of models and configurations are recommended ## Support For issues or questions: - Check the logs in controller.log, worker.log - Ensure all dependencies are correctly installed - Verify CUDA is properly configured - Review the COSYVOICE2_CHANGES.md for model-specific details