| # LLaMA-Omni2 Voice Assistant Setup Guide |
|
|
| This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration. |
|
|
| ## Prerequisites |
|
|
| - Ubuntu/Linux system with CUDA-capable GPU |
| - CUDA 12.1 or higher installed |
| - Miniconda or Anaconda installed |
| - At least 16GB RAM and 20GB free disk space |
| - Python 3.10 |
|
|
| ## Environment Setup Options |
|
|
| ### Option 1: Using Conda Environment File (Recommended) |
|
|
| ```bash |
| # Create environment from comprehensive yml file |
| conda env create -f environment-comprehensive.yml |
| |
| # Activate the environment |
| conda activate gsva-python310 |
| ``` |
|
|
| ### Option 2: Using Frozen Requirements |
|
|
| ```bash |
| # Create a new conda environment |
| conda create -n gsva-python310 python=3.10 -y |
| conda activate gsva-python310 |
| |
| # Install from frozen requirements |
| pip install -r requirements-frozen-new.txt |
| ``` |
|
|
| ### Option 3: Manual Setup Using Script |
|
|
| ```bash |
| # Run the complete setup script |
| bash script.sh |
| ``` |
|
|
| ## Detailed Manual Setup |
|
|
| ### 1. Create and Activate Conda Environment |
|
|
| ```bash |
| source /home/azureuser/miniconda3/etc/profile.d/conda.sh |
| conda create -n gsva-python310 python=3.10 -y |
| conda activate gsva-python310 |
| ``` |
|
|
| ### 2. Install Basic Dependencies |
|
|
| ```bash |
| pip install Cython numpy==1.26.4 |
| pip install packaging wheel setuptools==69.5.1 |
| ``` |
|
|
| ### 3. Install the Package |
|
|
| ```bash |
| # Install in development mode |
| pip install -e . |
| ``` |
|
|
| ### 4. Install Core Dependencies |
|
|
| ```bash |
| # Essential packages |
| pip install huggingface_hub==0.25.1 |
| pip install uvicorn openai-whisper fastapi |
| pip install hf_transfer ninja |
| |
| # Gradio for web interface |
| pip install gradio==5.3.0 gradio_client==1.4.2 |
| ``` |
|
|
| ### 5. Setup CUDA Environment |
|
|
| ```bash |
| # Link CUDA installation |
| sudo rm -rf /usr/local/cuda |
| sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda |
| export PATH=/usr/local/cuda/bin:$PATH |
| export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH |
| ``` |
|
|
| ### 6. Install PyTorch with CUDA Support |
|
|
| ```bash |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 |
| ``` |
|
|
| ### 7. Install Flash Attention |
|
|
| ```bash |
| MAX_JOBS=4 pip install flash-attn --no-build-isolation |
| ``` |
|
|
| ### 8. Install Transformers and Audio Libraries |
|
|
| ```bash |
| # Specific version for LLaMA-Omni2 compatibility |
| pip install transformers==4.43.4 |
| |
| # Audio processing libraries |
| pip install matcha-tts --no-build-isolation |
| pip install git+https://github.com/FunAudioLLM/CosyVoice.git |
| |
| # Additional dependencies |
| pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml |
| ``` |
|
|
| ## Model Downloads |
|
|
| ### 1. Download LLaMA-Omni2 Model |
|
|
| ```bash |
| mkdir -p models |
| huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B |
| ``` |
|
|
| ### 2. Download CosyVoice2 Model |
|
|
| ```bash |
| mkdir -p models/cosyvoice2 |
| python -c " |
| from huggingface_hub import snapshot_download |
| import os |
| os.makedirs('models/cosyvoice2', exist_ok=True) |
| snapshot_download( |
| repo_id='FunAudioLLM/CosyVoice2-0.5B', |
| local_dir='models/cosyvoice2', |
| local_dir_use_symlinks=False |
| ) |
| " |
| ``` |
|
|
| ### 3. Fix CosyVoice Configuration |
|
|
| ```bash |
| # Create backup |
| cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup |
| |
| # Copy to expected filename |
| cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml |
| |
| # Remove problematic parameter |
| grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml |
| mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml |
| ``` |
|
|
| ## Running the Services |
|
|
| ### 1. Start Controller |
|
|
| ```bash |
| nohup python -m llama_omni2.serve.controller \ |
| --host 0.0.0.0 \ |
| --port 10000 > controller.log 2>&1 & |
| ``` |
|
|
| ### 2. Start Model Worker |
|
|
| ```bash |
| nohup python -m llama_omni2.serve.model_worker \ |
| --host 0.0.0.0 \ |
| --controller http://localhost:10000 \ |
| --port 40000 \ |
| --worker http://localhost:40000 \ |
| --model-path models/LLaMA-Omni2-3B \ |
| --model-name LLaMA-Omni2-3B > worker.log 2>&1 & |
| ``` |
|
|
| ### 3. Start Gradio Web Server |
|
|
| With CosyVoice2 vocoder: |
| ```bash |
| python -m llama_omni2.serve.gradio_web_server \ |
| --controller http://localhost:10000 \ |
| --port 8000 \ |
| --vocoder-dir models/cosyvoice2 |
| ``` |
|
|
| Without vocoder (fallback): |
| ```bash |
| python -m llama_omni2.serve.gradio_web_server \ |
| --controller http://localhost:10000 \ |
| --port 8000 |
| ``` |
|
|
| ## Monitoring Services |
|
|
| ```bash |
| # Check controller logs |
| tail -f controller.log |
| |
| # Check model worker logs |
| tail -f worker.log |
| |
| # Access web UI |
| # Open browser at http://localhost:8000 |
| ``` |
|
|
| ## Troubleshooting |
|
|
| ### Common Issues |
|
|
| 1. **CUDA not found**: Ensure CUDA paths are exported correctly |
| 2. **Flash attention build fails**: Use `MAX_JOBS=4` to limit parallel compilation |
| 3. **CosyVoice mix_ratio error**: Follow the configuration fix steps above |
| 4. **Port already in use**: Kill existing processes or use different ports |
| |
| ### Killing Services |
| |
| ```bash |
| # Find and kill Python processes |
| ps aux | grep python | grep -E "(controller|model_worker|gradio_web_server)" | awk '{print $2}' | xargs -r kill |
| ``` |
| |
| ## Project Structure |
| |
| ``` |
| voiceagents/ |
| βββ llama_omni2/ # Main application code |
| βββ cosyvoice/ # CosyVoice integration |
| βββ models/ # Downloaded models |
| β βββ LLaMA-Omni2-3B/ |
| β βββ cosyvoice2/ |
| βββ examples/ # Sample audio files |
| βββ script.sh # Setup script |
| βββ pyproject.toml # Project configuration |
| βββ requirements-frozen-new.txt # Frozen dependencies |
| βββ environment-comprehensive.yml # Conda environment |
| βββ SETUP_GUIDE.md # This file |
| ``` |
| |
| ## Environment Variables |
| |
| Set these in your `.bashrc` or `.zshrc`: |
| |
| ```bash |
| export PATH=/usr/local/cuda/bin:$PATH |
| export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH |
| export HF_HUB_ENABLE_HF_TRANSFER=1 |
| export HF_HOME=~/.cache/huggingface |
| export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0" |
| export MAX_JOBS=4 |
| ``` |
| |
| ## Version Information |
| |
| - Python: 3.10 |
| - PyTorch: 2.3.1 |
| - Transformers: 4.43.4 |
| - Gradio: 5.3.0 |
| - CUDA: 12.1+ |
| - CosyVoice2: 0.5B model |
| |
| ## Additional Notes |
| |
| - The setup has been tested on Ubuntu with NVIDIA GPUs |
| - Ensure sufficient GPU memory (8GB+ recommended) |
| - For production deployment, consider using systemd services |
| - Regular backups of models and configurations are recommended |
| |
| ## Support |
| |
| For issues or questions: |
| - Check the logs in controller.log, worker.log |
| - Ensure all dependencies are correctly installed |
| - Verify CUDA is properly configured |
| - Review the COSYVOICE2_CHANGES.md for model-specific details |