AI Music Detector
Detects music generated by Suno <= 5 and Udio <= 1.5 using spectral fakeprint analysis.
Model Description
This model analyzes the frequency spectrum of audio to detect characteristic artifacts left by neural vocoders in AI music generators. These "fakeprints" are regularly-spaced peaks in the spectrum caused by transposed convolution (deconvolution) layers.
Architecture
- Type: Logistic Regression on spectral features
- Input: Fakeprint vector (3585 features)
- Output: Probability of AI-generated content (0.0 = Real, 1.0 = AI)
Preprocessing
Audio must be preprocessed to extract fakeprints:
| Parameter | Value |
|---|---|
| Sample Rate | 16000 Hz |
| FFT Size | 8192 |
| Frequency Range | 1000-8000 Hz |
| Hull Area | 10 bins |
Performance
Evaluated on a held-out test set of 17,866 samples (5,741 real, 12,125 AI-generated).
| Metric | Value |
|---|---|
| Accuracy | 99.88% |
| Precision | 0.9985 |
| Recall | 0.9998 |
| F1 Score | 0.9991 |
| False Positive Rate | 0.31% |
| False Negative Rate | 0.02% |
Training Data
- Real Music: FMA Medium (25k), Proprietary (5k)
- AI-Generated: SONICS dataset (49k), Proprietary (10k)
Usage
Python (ONNX)
import numpy as np
import onnxruntime as ort
# Load model
session = ort.InferenceSession("ai_music_detector.onnx")
# fakeprint = extract_fakeprint(audio_file) # Your preprocessing
output = session.run(None, {"fakeprint": fakeprint.reshape(1, -1)})
ai_probability = output[0][0, 0]
print(f"AI Probability: {ai_probability:.1%}")
Python (Safetensors)
from safetensors.numpy import load_file
import numpy as np
weights = load_file("model.safetensors")
w = weights["weights"] # Shape: (1, 3585)
b = weights["bias"] # Shape: (1,)
# fakeprint = extract_fakeprint(audio_file)
logit = np.dot(fakeprint, w.T) + b
probability = 1 / (1 + np.exp(-logit))
Limitations
- Sample Rate Dependent: Audio must be resampled to 16000 Hz
- Minimum Duration: Works best with 10+ seconds of audio
- Evolving Generators: Needs retraining on new generations of AI music generators
Acknowledgements
This implementation is based on the fakeprint detection method proposed by Afchar et al. [1], which demonstrates that neural vocoders in generative music models produce characteristic frequency-domain artifacts due to their deconvolution architecture.
References
[1] D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, "A Fourier Explanation of AI-music Artifacts," in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2025. Available: https://arxiv.org/abs/2506.19108
License
MIT License
- Downloads last month
- 13