---
description: Set up conda environment for speech-to-text fine-tuning
tags: [python, conda, stt, whisper, speech, ai, fine-tuning, project, gitignored]
---

You are helping the user set up a conda environment for speech-to-text (STT) fine-tuning.

## Process

1. **Create base environment**
   ```bash
   conda create -n stt-finetune python=3.11 -y
   conda activate stt-finetune
   ```

2. **Install PyTorch with ROCm**
   ```bash
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
   ```

3. **Install Whisper and related libraries**
   ```bash
   pip install openai-whisper
   pip install faster-whisper  # Optimized inference
   pip install whisperx        # Advanced features
   ```

4. **Install Hugging Face libraries**
   ```bash
   pip install transformers
   pip install datasets
   pip install accelerate
   pip install evaluate
   pip install peft           # For LoRA fine-tuning
   ```

5. **Install audio processing libraries**
   ```bash
   pip install librosa         # Audio analysis
   pip install soundfile       # Audio I/O
   pip install pydub           # Audio manipulation
   pip install sox             # Audio processing
   conda install -c conda-forge ffmpeg -y  # Audio conversion
   ```

6. **Install speech-specific tools**
   ```bash
   pip install jiwer          # Word Error Rate calculation
   pip install speechbrain    # Speech toolkit
   pip install pyannote.audio # Speaker diarization
   ```

7. **Install data processing tools**
   ```bash
   pip install pandas
   pip install numpy
   pip install scipy
   pip install matplotlib
   pip install seaborn        # Visualization
   ```

8. **Install monitoring and experimentation**
   ```bash
   pip install wandb          # Experiment tracking
   pip install tensorboard
   ```

9. **Install Jupyter for interactive work**
   ```bash
   conda install -c conda-forge jupyter jupyterlab ipywidgets -y
   ```

10. **Test installation**
   ```python
   import torch
   import whisper
   import librosa
   from transformers import WhisperProcessor, WhisperForConditionalGeneration

   print(f"PyTorch: {torch.__version__}")
   print(f"GPU available: {torch.cuda.is_available()}")
   print("All libraries imported successfully!")
   ```

11. **Suggest common datasets**
   - Common Voice (Mozilla)
   - LibriSpeech
   - TEDLIUM
   - Custom datasets

12. **Create example script**
   - Offer to create `~/scripts/whisper-finetune-example.py` with basic setup

## Output

Provide a summary showing:
- Environment name and setup status
- Installed libraries grouped by purpose
- GPU detection status
- Available VRAM for training
- Suggested datasets for fine-tuning
- Example commands for testing
- Links to documentation/tutorials