voxguard-lora / README.md

gereon

Upload README.md with huggingface_hub

98675e2 verified 9 days ago

preview code

raw

history blame contribute delete

2.57 kB

metadata

language: en
license: mit
tags:
  - audio
  - deepfake-detection
  - lora
  - speech
base_model: Speech-Arena-2025/DF_Arena_1B_V_1
library_name: peft

VoxGuard LoRA - Deepfake Speech Detection

LoRA adapter for detecting AI-generated (deepfake) speech, fine-tuned on top of Speech-Arena-2025/DF_Arena_1B_V_1 (1.15B parameters).

Model Details


Base model	Speech-Arena-2025/DF_Arena_1B_V_1
Method	LoRA (Low-Rank Adaptation)
LoRA config	r=8, alpha=16, dropout=0.1, target_modules="all-linear"
Trainable params	~10M / 1.15B (0.86%)

Training Data

Real speech: LibriSpeech samples (280+ unique speakers across clean and other subsets)
Fake speech: 10,000+ samples generated with Qwen3-TTS voice cloning via Replicate API
Augmentation: Phone-call audio degradation (codec, noise, band-pass, clipping, reverb, packet loss)
Dataset: gereon/voxguard-synthetic-speech

Results

Augmented Model (root - recommended)

Metric	Baseline	Val	Test
Accuracy	77.5%	99.2%	100%
F1	0.794	0.992	1.000

Trained for 20 epochs (8.7 hrs) with phone-call audio augmentation on 6K samples.

Non-Augmented Model (`non-augmented/`)

Metric	Baseline	Val	Test
Accuracy	97.5%	100%	100%
F1	0.976	1.000	1.000

Early-stopped at epoch 14/20 (best at epoch 2) on 2K samples.

Note: This represents an intentional overfit - the goal is to maintain the base model's generalizability while learning signatures of new deepfake models.

Usage

from peft import PeftModel
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

base_model = AutoModelForAudioClassification.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")
model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora")
feature_extractor = AutoFeatureExtractor.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")

# For the non-augmented variant:
# model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora", subfolder="non-augmented")

Dataset: gereon/voxguard-synthetic-speech - 10K+ synthetic speech samples
Code: gereonelvers/voxguard

gereon
/

voxguard-lora

VoxGuard LoRA - Deepfake Speech Detection

Model Details

Training Data

Results

Augmented Model (root - recommended)

Non-Augmented Model (`non-augmented/`)

Usage

Related

VoxGuard LoRA - Deepfake Speech Detection

Model Details

Training Data

Results

Augmented Model (root - recommended)

Non-Augmented Model (non-augmented/)

Usage

Related

Non-Augmented Model (`non-augmented/`)