# 🎙️ Voice Detection Model Trainer This sub-project is dedicated to fine-tuning a custom AI Voice Detection model tailored to your specific audio samples and languages (Tamil, English, Hindi, Malayalam, Telugu). ## 🏗️ Architecture - **Base Model**: `facebook/wav2vec2-large-xlsr-53` (Multilingual) - **Task**: Audio Classification (Binary: HUMAN vs AI_GENERATED) ## 📁 Directory Structure - `data/`: Put your training audio files here. - `real/`: Human voice samples. - `fake/`: AI generated voice samples. - `output/`: Fine-tuned model checkpoints will be saved here. - `train.py`: Main fine-tuning script. - `prepare_data.py`: Script to convert audio folders into Hugging Face datasets. ## 🚀 Getting Started 1. **Collect Data**: The more data you have, the better the accuracy. Aim for at least 100-500 samples per category per language. 2. **Setup Environment**: ```bash pip install transformers datasets torch torchaudio accelerate ``` 3. **Run Training**: ```bash python train.py ``` ## 🔧 Why a Custom Model? The public models (`mo-thecreator`, etc.) are trained on general datasets. A custom model fine-tuned on **your specific AI voices** (e.g., from specific TTS engines you use) will have much higher accuracy for your use case.