Spaces:
Sleeping
Sleeping
| # ποΈ Voice Detection Model Trainer | |
| This sub-project is dedicated to fine-tuning a custom AI Voice Detection model tailored to your specific audio samples and languages (Tamil, English, Hindi, Malayalam, Telugu). | |
| ## ποΈ Architecture | |
| - **Base Model**: `facebook/wav2vec2-large-xlsr-53` (Multilingual) | |
| - **Task**: Audio Classification (Binary: HUMAN vs AI_GENERATED) | |
| ## π Directory Structure | |
| - `data/`: Put your training audio files here. | |
| - `real/`: Human voice samples. | |
| - `fake/`: AI generated voice samples. | |
| - `output/`: Fine-tuned model checkpoints will be saved here. | |
| - `train.py`: Main fine-tuning script. | |
| - `prepare_data.py`: Script to convert audio folders into Hugging Face datasets. | |
| ## π Getting Started | |
| 1. **Collect Data**: The more data you have, the better the accuracy. Aim for at least 100-500 samples per category per language. | |
| 2. **Setup Environment**: | |
| ```bash | |
| pip install transformers datasets torch torchaudio accelerate | |
| ``` | |
| 3. **Run Training**: | |
| ```bash | |
| python train.py | |
| ``` | |
| ## π§ Why a Custom Model? | |
| The public models (`mo-thecreator`, etc.) are trained on general datasets. A custom model fine-tuned on **your specific AI voices** (e.g., from specific TTS engines you use) will have much higher accuracy for your use case. | |