File size: 2,762 Bytes
292d92c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
description: Set up conda environment for speech-to-text fine-tuning
tags: [python, conda, stt, whisper, speech, ai, fine-tuning, project, gitignored]
---

You are helping the user set up a conda environment for speech-to-text (STT) fine-tuning.

## Process

1. **Create base environment**
   ```bash
   conda create -n stt-finetune python=3.11 -y
   conda activate stt-finetune
   ```

2. **Install PyTorch with ROCm**
   ```bash
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
   ```

3. **Install Whisper and related libraries**
   ```bash
   pip install openai-whisper
   pip install faster-whisper  # Optimized inference
   pip install whisperx        # Advanced features
   ```

4. **Install Hugging Face libraries**
   ```bash
   pip install transformers
   pip install datasets
   pip install accelerate
   pip install evaluate
   pip install peft           # For LoRA fine-tuning
   ```

5. **Install audio processing libraries**
   ```bash
   pip install librosa         # Audio analysis
   pip install soundfile       # Audio I/O
   pip install pydub           # Audio manipulation
   pip install sox             # Audio processing
   conda install -c conda-forge ffmpeg -y  # Audio conversion
   ```

6. **Install speech-specific tools**
   ```bash
   pip install jiwer          # Word Error Rate calculation
   pip install speechbrain    # Speech toolkit
   pip install pyannote.audio # Speaker diarization
   ```

7. **Install data processing tools**
   ```bash
   pip install pandas
   pip install numpy
   pip install scipy
   pip install matplotlib
   pip install seaborn        # Visualization
   ```

8. **Install monitoring and experimentation**
   ```bash
   pip install wandb          # Experiment tracking
   pip install tensorboard
   ```

9. **Install Jupyter for interactive work**
   ```bash
   conda install -c conda-forge jupyter jupyterlab ipywidgets -y
   ```

10. **Test installation**
   ```python
   import torch
   import whisper
   import librosa
   from transformers import WhisperProcessor, WhisperForConditionalGeneration

   print(f"PyTorch: {torch.__version__}")
   print(f"GPU available: {torch.cuda.is_available()}")
   print("All libraries imported successfully!")
   ```

11. **Suggest common datasets**
   - Common Voice (Mozilla)
   - LibriSpeech
   - TEDLIUM
   - Custom datasets

12. **Create example script**
   - Offer to create `~/scripts/whisper-finetune-example.py` with basic setup

## Output

Provide a summary showing:
- Environment name and setup status
- Installed libraries grouped by purpose
- GPU detection status
- Available VRAM for training
- Suggested datasets for fine-tuning
- Example commands for testing
- Links to documentation/tutorials