Upload folder using huggingface_hub

778d4b8 verified 8 months ago

6.54 kB

	# LLaMA-Omni2 Voice Assistant Setup Guide

	This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration.

	## Prerequisites

	- Ubuntu/Linux system with CUDA-capable GPU
	- CUDA 12.1 or higher installed
	- Miniconda or Anaconda installed
	- At least 16GB RAM and 20GB free disk space
	- Python 3.10

	## Environment Setup Options

	### Option 1: Using Conda Environment File (Recommended)

	```bash
	# Create environment from comprehensive yml file
	conda env create -f environment-comprehensive.yml

	# Activate the environment
	conda activate gsva-python310
	```

	### Option 2: Using Frozen Requirements

	```bash
	# Create a new conda environment
	conda create -n gsva-python310 python=3.10 -y
	conda activate gsva-python310

	# Install from frozen requirements
	pip install -r requirements-frozen-new.txt
	```

	### Option 3: Manual Setup Using Script

	```bash
	# Run the complete setup script
	bash script.sh
	```

	## Detailed Manual Setup

	### 1. Create and Activate Conda Environment

	```bash
	source /home/azureuser/miniconda3/etc/profile.d/conda.sh
	conda create -n gsva-python310 python=3.10 -y
	conda activate gsva-python310
	```

	### 2. Install Basic Dependencies

	```bash
	pip install Cython numpy==1.26.4
	pip install packaging wheel setuptools==69.5.1
	```

	### 3. Install the Package

	```bash
	# Install in development mode
	pip install -e .
	```

	### 4. Install Core Dependencies

	```bash
	# Essential packages
	pip install huggingface_hub==0.25.1
	pip install uvicorn openai-whisper fastapi
	pip install hf_transfer ninja

	# Gradio for web interface
	pip install gradio==5.3.0 gradio_client==1.4.2
	```

	### 5. Setup CUDA Environment

	```bash
	# Link CUDA installation
	sudo rm -rf /usr/local/cuda
	sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda
	export PATH=/usr/local/cuda/bin:$PATH
	export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
	```

	### 6. Install PyTorch with CUDA Support

	```bash
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
	```

	### 7. Install Flash Attention

	```bash
	MAX_JOBS=4 pip install flash-attn --no-build-isolation
	```

	### 8. Install Transformers and Audio Libraries

	```bash
	# Specific version for LLaMA-Omni2 compatibility
	pip install transformers==4.43.4

	# Audio processing libraries
	pip install matcha-tts --no-build-isolation
	pip install git+https://github.com/FunAudioLLM/CosyVoice.git

	# Additional dependencies
	pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml
	```

	## Model Downloads

	### 1. Download LLaMA-Omni2 Model

	```bash
	mkdir -p models
	huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B
	```

	### 2. Download CosyVoice2 Model

	```bash
	mkdir -p models/cosyvoice2
	python -c "
	from huggingface_hub import snapshot_download
	import os
	os.makedirs('models/cosyvoice2', exist_ok=True)
	snapshot_download(
	repo_id='FunAudioLLM/CosyVoice2-0.5B',
	local_dir='models/cosyvoice2',
	local_dir_use_symlinks=False
	)
	"
	```

	### 3. Fix CosyVoice Configuration

	```bash
	# Create backup
	cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup

	# Copy to expected filename
	cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml

	# Remove problematic parameter
	grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml
	mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml
	```

	## Running the Services

	### 1. Start Controller

	```bash
	nohup python -m llama_omni2.serve.controller \
	--host 0.0.0.0 \
	--port 10000 > controller.log 2>&1 &
	```

	### 2. Start Model Worker

	```bash
	nohup python -m llama_omni2.serve.model_worker \
	--host 0.0.0.0 \
	--controller http://localhost:10000 \
	--port 40000 \
	--worker http://localhost:40000 \
	--model-path models/LLaMA-Omni2-3B \
	--model-name LLaMA-Omni2-3B > worker.log 2>&1 &
	```

	### 3. Start Gradio Web Server

	With CosyVoice2 vocoder:
	```bash
	python -m llama_omni2.serve.gradio_web_server \
	--controller http://localhost:10000 \
	--port 8000 \
	--vocoder-dir models/cosyvoice2
	```

	Without vocoder (fallback):
	```bash
	python -m llama_omni2.serve.gradio_web_server \
	--controller http://localhost:10000 \
	--port 8000
	```

	## Monitoring Services

	```bash
	# Check controller logs
	tail -f controller.log

	# Check model worker logs
	tail -f worker.log

	# Access web UI
	# Open browser at http://localhost:8000
	```

	## Troubleshooting

	### Common Issues

	1. CUDA not found: Ensure CUDA paths are exported correctly
	2. Flash attention build fails: Use `MAX_JOBS=4` to limit parallel compilation
	3. CosyVoice mix_ratio error: Follow the configuration fix steps above
	4. Port already in use: Kill existing processes or use different ports

	### Killing Services

	```bash
	# Find and kill Python processes
	ps aux \| grep python \| grep -E "(controller\|model_worker\|gradio_web_server)" \| awk '{print $2}' \| xargs -r kill
	```

	## Project Structure

	```
	voiceagents/
	├── llama_omni2/ # Main application code
	├── cosyvoice/ # CosyVoice integration
	├── models/ # Downloaded models
	│ ├── LLaMA-Omni2-3B/
	│ └── cosyvoice2/
	├── examples/ # Sample audio files
	├── script.sh # Setup script
	├── pyproject.toml # Project configuration
	├── requirements-frozen-new.txt # Frozen dependencies
	├── environment-comprehensive.yml # Conda environment
	└── SETUP_GUIDE.md # This file
	```

	## Environment Variables

	Set these in your `.bashrc` or `.zshrc`:

	```bash
	export PATH=/usr/local/cuda/bin:$PATH
	export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
	export HF_HUB_ENABLE_HF_TRANSFER=1
	export HF_HOME=~/.cache/huggingface
	export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0"
	export MAX_JOBS=4
	```

	## Version Information

	- Python: 3.10
	- PyTorch: 2.3.1
	- Transformers: 4.43.4
	- Gradio: 5.3.0
	- CUDA: 12.1+
	- CosyVoice2: 0.5B model

	## Additional Notes

	- The setup has been tested on Ubuntu with NVIDIA GPUs
	- Ensure sufficient GPU memory (8GB+ recommended)
	- For production deployment, consider using systemd services
	- Regular backups of models and configurations are recommended

	## Support

	For issues or questions:
	- Check the logs in controller.log, worker.log
	- Ensure all dependencies are correctly installed
	- Verify CUDA is properly configured
	- Review the COSYVOICE2_CHANGES.md for model-specific details