vineetshukla.work@gmail.com
final commit
c5c9261

πŸŽ™οΈ Voice Detection Model Trainer

This sub-project is dedicated to fine-tuning a custom AI Voice Detection model tailored to your specific audio samples and languages (Tamil, English, Hindi, Malayalam, Telugu).

πŸ—οΈ Architecture

  • Base Model: facebook/wav2vec2-large-xlsr-53 (Multilingual)
  • Task: Audio Classification (Binary: HUMAN vs AI_GENERATED)

πŸ“ Directory Structure

  • data/: Put your training audio files here.
    • real/: Human voice samples.
    • fake/: AI generated voice samples.
  • output/: Fine-tuned model checkpoints will be saved here.
  • train.py: Main fine-tuning script.
  • prepare_data.py: Script to convert audio folders into Hugging Face datasets.

πŸš€ Getting Started

  1. Collect Data: The more data you have, the better the accuracy. Aim for at least 100-500 samples per category per language.
  2. Setup Environment:
    pip install transformers datasets torch torchaudio accelerate
    
  3. Run Training:
    python train.py
    

πŸ”§ Why a Custom Model?

The public models (mo-thecreator, etc.) are trained on general datasets. A custom model fine-tuned on your specific AI voices (e.g., from specific TTS engines you use) will have much higher accuracy for your use case.