vineetshukla.work@gmail.com
final commit
c5c9261
# πŸŽ™οΈ Voice Detection Model Trainer
This sub-project is dedicated to fine-tuning a custom AI Voice Detection model tailored to your specific audio samples and languages (Tamil, English, Hindi, Malayalam, Telugu).
## πŸ—οΈ Architecture
- **Base Model**: `facebook/wav2vec2-large-xlsr-53` (Multilingual)
- **Task**: Audio Classification (Binary: HUMAN vs AI_GENERATED)
## πŸ“ Directory Structure
- `data/`: Put your training audio files here.
- `real/`: Human voice samples.
- `fake/`: AI generated voice samples.
- `output/`: Fine-tuned model checkpoints will be saved here.
- `train.py`: Main fine-tuning script.
- `prepare_data.py`: Script to convert audio folders into Hugging Face datasets.
## πŸš€ Getting Started
1. **Collect Data**: The more data you have, the better the accuracy. Aim for at least 100-500 samples per category per language.
2. **Setup Environment**:
```bash
pip install transformers datasets torch torchaudio accelerate
```
3. **Run Training**:
```bash
python train.py
```
## πŸ”§ Why a Custom Model?
The public models (`mo-thecreator`, etc.) are trained on general datasets. A custom model fine-tuned on **your specific AI voices** (e.g., from specific TTS engines you use) will have much higher accuracy for your use case.