AI Music Detector

Detects music generated by Suno <= 5 and Udio <= 1.5 using spectral fakeprint analysis.

Model Description

This model analyzes the frequency spectrum of audio to detect characteristic artifacts left by neural vocoders in AI music generators. These "fakeprints" are regularly-spaced peaks in the spectrum caused by transposed convolution (deconvolution) layers.

Architecture

Type: Logistic Regression on spectral features
Input: Fakeprint vector (3585 features)
Output: Probability of AI-generated content (0.0 = Real, 1.0 = AI)

Preprocessing

Audio must be preprocessed to extract fakeprints:

Parameter	Value
Sample Rate	16000 Hz
FFT Size	8192
Frequency Range	1000-8000 Hz
Hull Area	10 bins

Performance

Evaluated on a held-out test set of 17,866 samples (5,741 real, 12,125 AI-generated).

Metric	Value
Accuracy	99.88%
Precision	0.9985
Recall	0.9998
F1 Score	0.9991
False Positive Rate	0.31%
False Negative Rate	0.02%

Training Data

Real Music: FMA Medium (25k), Proprietary (5k)
AI-Generated: SONICS dataset (49k), Proprietary (10k)

Usage

Python (ONNX)

import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("ai_music_detector.onnx")

# fakeprint = extract_fakeprint(audio_file)  # Your preprocessing
output = session.run(None, {"fakeprint": fakeprint.reshape(1, -1)})
ai_probability = output[0][0, 0]

print(f"AI Probability: {ai_probability:.1%}")

Python (Safetensors)

from safetensors.numpy import load_file
import numpy as np

weights = load_file("model.safetensors")
w = weights["weights"]  # Shape: (1, 3585)
b = weights["bias"]     # Shape: (1,)

# fakeprint = extract_fakeprint(audio_file)
logit = np.dot(fakeprint, w.T) + b
probability = 1 / (1 + np.exp(-logit))

Limitations

Sample Rate Dependent: Audio must be resampled to 16000 Hz
Minimum Duration: Works best with 10+ seconds of audio
Evolving Generators: Needs retraining on new generations of AI music generators

Acknowledgements

This implementation is based on the fakeprint detection method proposed by Afchar et al. [1], which demonstrates that neural vocoders in generative music models produce characteristic frequency-domain artifacts due to their deconvolution architecture.

References

[1] D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, "A Fourier Explanation of AI-music Artifacts," in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2025. Available: https://arxiv.org/abs/2506.19108

License

MIT License

Downloads last month: 14

Paper for lofcz/ai-music-detector

A Fourier Explanation of AI-music Artifacts

Paper • 2506.19108 • Published Jun 23, 2025