gereon
/

voxguard-lora

deepfake-detection

Model card Files Files and versions

voxguard-lora / README.md

gereon's picture

Upload README.md with huggingface_hub

98675e2 verified 9 days ago

|

history blame contribute delete

2.57 kB

	---
	language: en
	license: mit
	tags:
	- audio
	- deepfake-detection
	- lora
	- speech
	base_model: Speech-Arena-2025/DF_Arena_1B_V_1
	library_name: peft
	---

	# VoxGuard LoRA - Deepfake Speech Detection

	LoRA adapter for detecting AI-generated (deepfake) speech, fine-tuned on top of [Speech-Arena-2025/DF_Arena_1B_V_1](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) (1.15B parameters).

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Base model \| Speech-Arena-2025/DF_Arena_1B_V_1 \|
	\| Method \| LoRA (Low-Rank Adaptation) \|
	\| LoRA config \| r=8, alpha=16, dropout=0.1, target_modules="all-linear" \|
	\| Trainable params \| ~10M / 1.15B (0.86%) \|

	## Training Data

	- Real speech: LibriSpeech samples (280+ unique speakers across clean and other subsets)
	- Fake speech: 10,000+ samples generated with [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS) voice cloning via Replicate API
	- Augmentation: Phone-call audio degradation (codec, noise, band-pass, clipping, reverb, packet loss)
	- Dataset: [gereon/voxguard-synthetic-speech](https://huggingface.co/datasets/gereon/voxguard-synthetic-speech)

	## Results

	### Augmented Model (root - recommended)

	\| Metric \| Baseline \| Val \| Test \|
	\|--------\|----------\|-----\|------\|
	\| Accuracy \| 77.5% \| 99.2% \| 100% \|
	\| F1 \| 0.794 \| 0.992 \| 1.000 \|

	Trained for 20 epochs (8.7 hrs) with phone-call audio augmentation on 6K samples.

	### Non-Augmented Model (`non-augmented/`)

	\| Metric \| Baseline \| Val \| Test \|
	\|--------\|----------\|-----\|------\|
	\| Accuracy \| 97.5% \| 100% \| 100% \|
	\| F1 \| 0.976 \| 1.000 \| 1.000 \|

	Early-stopped at epoch 14/20 (best at epoch 2) on 2K samples.

	> Note: This represents an intentional overfit - the goal is to maintain the base model's generalizability while learning signatures of new deepfake models.

	## Usage

	```python
	from peft import PeftModel
	from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

	base_model = AutoModelForAudioClassification.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")
	model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora")
	feature_extractor = AutoFeatureExtractor.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")

	# For the non-augmented variant:
	# model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora", subfolder="non-augmented")
	```

	## Related

	- Dataset: [gereon/voxguard-synthetic-speech](https://huggingface.co/datasets/gereon/voxguard-synthetic-speech) - 10K+ synthetic speech samples
	- Code: [gereonelvers/voxguard](https://github.com/gereonelvers/voxguard)