---
language: en
license: mit
tags:
  - audio
  - deepfake-detection
  - lora
  - speech
base_model: Speech-Arena-2025/DF_Arena_1B_V_1
library_name: peft
---

# VoxGuard LoRA - Deepfake Speech Detection

LoRA adapter for detecting AI-generated (deepfake) speech, fine-tuned on top of [Speech-Arena-2025/DF_Arena_1B_V_1](https://huggingface.co/Speech-Arena-2025/DF_Arena_1B_V_1) (1.15B parameters).

## Model Details

| | |
|---|---|
| **Base model** | Speech-Arena-2025/DF_Arena_1B_V_1 |
| **Method** | LoRA (Low-Rank Adaptation) |
| **LoRA config** | r=8, alpha=16, dropout=0.1, target_modules="all-linear" |
| **Trainable params** | ~10M / 1.15B (0.86%) |

## Training Data

- **Real speech:** LibriSpeech samples (280+ unique speakers across clean and other subsets)
- **Fake speech:** 10,000+ samples generated with [Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS) voice cloning via Replicate API
- **Augmentation:** Phone-call audio degradation (codec, noise, band-pass, clipping, reverb, packet loss)
- **Dataset:** [gereon/voxguard-synthetic-speech](https://huggingface.co/datasets/gereon/voxguard-synthetic-speech)

## Results

### Augmented Model (root - recommended)

| Metric | Baseline | Val | Test |
|--------|----------|-----|------|
| Accuracy | 77.5% | 99.2% | 100% |
| F1 | 0.794 | 0.992 | 1.000 |

Trained for 20 epochs (8.7 hrs) with phone-call audio augmentation on 6K samples.

### Non-Augmented Model (`non-augmented/`)

| Metric | Baseline | Val | Test |
|--------|----------|-----|------|
| Accuracy | 97.5% | 100% | 100% |
| F1 | 0.976 | 1.000 | 1.000 |

Early-stopped at epoch 14/20 (best at epoch 2) on 2K samples.

> Note: This represents an intentional overfit - the goal is to maintain the base model's generalizability while learning signatures of new deepfake models.

## Usage

```python
from peft import PeftModel
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

base_model = AutoModelForAudioClassification.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")
model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora")
feature_extractor = AutoFeatureExtractor.from_pretrained("Speech-Arena-2025/DF_Arena_1B_V_1")

# For the non-augmented variant:
# model = PeftModel.from_pretrained(base_model, "gereon/voxguard-lora", subfolder="non-augmented")
```

## Related

- **Dataset:** [gereon/voxguard-synthetic-speech](https://huggingface.co/datasets/gereon/voxguard-synthetic-speech) - 10K+ synthetic speech samples
- **Code:** [gereonelvers/voxguard](https://github.com/gereonelvers/voxguard)