Spaces:

Pandaisop
/

voice-detection-api

Sleeping

vineetshukla.work@gmail.com

final commit

c5c9261 4 months ago

1.29 kB

	# 🎙️ Voice Detection Model Trainer

	This sub-project is dedicated to fine-tuning a custom AI Voice Detection model tailored to your specific audio samples and languages (Tamil, English, Hindi, Malayalam, Telugu).

	## 🏗️ Architecture
	- Base Model: `facebook/wav2vec2-large-xlsr-53` (Multilingual)
	- Task: Audio Classification (Binary: HUMAN vs AI_GENERATED)

	## 📁 Directory Structure
	- `data/`: Put your training audio files here.
	- `real/`: Human voice samples.
	- `fake/`: AI generated voice samples.
	- `output/`: Fine-tuned model checkpoints will be saved here.
	- `train.py`: Main fine-tuning script.
	- `prepare_data.py`: Script to convert audio folders into Hugging Face datasets.

	## 🚀 Getting Started
	1. Collect Data: The more data you have, the better the accuracy. Aim for at least 100-500 samples per category per language.
	2. Setup Environment:
	```bash
	pip install transformers datasets torch torchaudio accelerate
	```
	3. Run Training:
	```bash
	python train.py
	```

	## 🔧 Why a Custom Model?
	The public models (`mo-thecreator`, etc.) are trained on general datasets. A custom model fine-tuned on your specific AI voices (e.g., from specific TTS engines you use) will have much higher accuracy for your use case.