File size: 6,544 Bytes
778d4b8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# LLaMA-Omni2 Voice Assistant Setup Guide

This guide provides comprehensive instructions for reproducing the exact environment and setup for the LLaMA-Omni2 voice assistant with CosyVoice2 integration.

## Prerequisites

- Ubuntu/Linux system with CUDA-capable GPU
- CUDA 12.1 or higher installed
- Miniconda or Anaconda installed
- At least 16GB RAM and 20GB free disk space
- Python 3.10

## Environment Setup Options

### Option 1: Using Conda Environment File (Recommended)

```bash
# Create environment from comprehensive yml file
conda env create -f environment-comprehensive.yml

# Activate the environment
conda activate gsva-python310
```

### Option 2: Using Frozen Requirements

```bash
# Create a new conda environment
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310

# Install from frozen requirements
pip install -r requirements-frozen-new.txt
```

### Option 3: Manual Setup Using Script

```bash
# Run the complete setup script
bash script.sh
```

## Detailed Manual Setup

### 1. Create and Activate Conda Environment

```bash
source /home/azureuser/miniconda3/etc/profile.d/conda.sh
conda create -n gsva-python310 python=3.10 -y
conda activate gsva-python310
```

### 2. Install Basic Dependencies

```bash
pip install Cython numpy==1.26.4
pip install packaging wheel setuptools==69.5.1
```

### 3. Install the Package

```bash
# Install in development mode
pip install -e .
```

### 4. Install Core Dependencies

```bash
# Essential packages
pip install huggingface_hub==0.25.1
pip install uvicorn openai-whisper fastapi
pip install hf_transfer ninja

# Gradio for web interface
pip install gradio==5.3.0 gradio_client==1.4.2
```

### 5. Setup CUDA Environment

```bash
# Link CUDA installation
sudo rm -rf /usr/local/cuda
sudo ln -s /usr/local/cuda-12.6 /usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```

### 6. Install PyTorch with CUDA Support

```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```

### 7. Install Flash Attention

```bash
MAX_JOBS=4 pip install flash-attn --no-build-isolation
```

### 8. Install Transformers and Audio Libraries

```bash
# Specific version for LLaMA-Omni2 compatibility
pip install transformers==4.43.4

# Audio processing libraries
pip install matcha-tts --no-build-isolation
pip install git+https://github.com/FunAudioLLM/CosyVoice.git

# Additional dependencies
pip install conformer onnxruntime hyperpyyaml==1.2.2 ruamel.yaml
```

## Model Downloads

### 1. Download LLaMA-Omni2 Model

```bash
mkdir -p models
huggingface-cli download ICTNLP/LLaMA-Omni2-3B --local-dir models/LLaMA-Omni2-3B
```

### 2. Download CosyVoice2 Model

```bash
mkdir -p models/cosyvoice2
python -c "
from huggingface_hub import snapshot_download
import os
os.makedirs('models/cosyvoice2', exist_ok=True)
snapshot_download(
    repo_id='FunAudioLLM/CosyVoice2-0.5B',
    local_dir='models/cosyvoice2',
    local_dir_use_symlinks=False
)
"
```

### 3. Fix CosyVoice Configuration

```bash
# Create backup
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice2.yaml.backup

# Copy to expected filename
cp models/cosyvoice2/cosyvoice2.yaml models/cosyvoice2/cosyvoice.yaml

# Remove problematic parameter
grep -v "mix_ratio" models/cosyvoice2/cosyvoice.yaml > models/cosyvoice2/cosyvoice_fixed.yaml
mv models/cosyvoice2/cosyvoice_fixed.yaml models/cosyvoice2/cosyvoice.yaml
```

## Running the Services

### 1. Start Controller

```bash
nohup python -m llama_omni2.serve.controller \
    --host 0.0.0.0 \
    --port 10000 > controller.log 2>&1 &
```

### 2. Start Model Worker

```bash
nohup python -m llama_omni2.serve.model_worker \
    --host 0.0.0.0 \
    --controller http://localhost:10000 \
    --port 40000 \
    --worker http://localhost:40000 \
    --model-path models/LLaMA-Omni2-3B \
    --model-name LLaMA-Omni2-3B > worker.log 2>&1 &
```

### 3. Start Gradio Web Server

With CosyVoice2 vocoder:
```bash
python -m llama_omni2.serve.gradio_web_server \
    --controller http://localhost:10000 \
    --port 8000 \
    --vocoder-dir models/cosyvoice2
```

Without vocoder (fallback):
```bash
python -m llama_omni2.serve.gradio_web_server \
    --controller http://localhost:10000 \
    --port 8000
```

## Monitoring Services

```bash
# Check controller logs
tail -f controller.log

# Check model worker logs
tail -f worker.log

# Access web UI
# Open browser at http://localhost:8000
```

## Troubleshooting

### Common Issues

1. **CUDA not found**: Ensure CUDA paths are exported correctly
2. **Flash attention build fails**: Use `MAX_JOBS=4` to limit parallel compilation
3. **CosyVoice mix_ratio error**: Follow the configuration fix steps above
4. **Port already in use**: Kill existing processes or use different ports

### Killing Services

```bash
# Find and kill Python processes
ps aux | grep python | grep -E "(controller|model_worker|gradio_web_server)" | awk '{print $2}' | xargs -r kill
```

## Project Structure

```
voiceagents/
β”œβ”€β”€ llama_omni2/          # Main application code
β”œβ”€β”€ cosyvoice/            # CosyVoice integration
β”œβ”€β”€ models/               # Downloaded models
β”‚   β”œβ”€β”€ LLaMA-Omni2-3B/
β”‚   └── cosyvoice2/
β”œβ”€β”€ examples/             # Sample audio files
β”œβ”€β”€ script.sh             # Setup script
β”œβ”€β”€ pyproject.toml        # Project configuration
β”œβ”€β”€ requirements-frozen-new.txt  # Frozen dependencies
β”œβ”€β”€ environment-comprehensive.yml # Conda environment
└── SETUP_GUIDE.md        # This file
```

## Environment Variables

Set these in your `.bashrc` or `.zshrc`:

```bash
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export HF_HUB_ENABLE_HF_TRANSFER=1
export HF_HOME=~/.cache/huggingface
export TORCH_CUDA_ARCH_LIST="7.0;7.5;8.0;8.6;8.9;9.0"
export MAX_JOBS=4
```

## Version Information

- Python: 3.10
- PyTorch: 2.3.1
- Transformers: 4.43.4
- Gradio: 5.3.0
- CUDA: 12.1+
- CosyVoice2: 0.5B model

## Additional Notes

- The setup has been tested on Ubuntu with NVIDIA GPUs
- Ensure sufficient GPU memory (8GB+ recommended)
- For production deployment, consider using systemd services
- Regular backups of models and configurations are recommended

## Support

For issues or questions:
- Check the logs in controller.log, worker.log
- Ensure all dependencies are correctly installed
- Verify CUDA is properly configured
- Review the COSYVOICE2_CHANGES.md for model-specific details