AI Music Detector

Detects music generated by Suno <= 5 and Udio <= 1.5 using spectral fakeprint analysis.

Model Description

This model analyzes the frequency spectrum of audio to detect characteristic artifacts left by neural vocoders in AI music generators. These "fakeprints" are regularly-spaced peaks in the spectrum caused by transposed convolution (deconvolution) layers.

Architecture

  • Type: Logistic Regression on spectral features
  • Input: Fakeprint vector (3585 features)
  • Output: Probability of AI-generated content (0.0 = Real, 1.0 = AI)

Preprocessing

Audio must be preprocessed to extract fakeprints:

Parameter Value
Sample Rate 16000 Hz
FFT Size 8192
Frequency Range 1000-8000 Hz
Hull Area 10 bins

Performance

Evaluated on a held-out test set of 17,866 samples (5,741 real, 12,125 AI-generated).

Metric Value
Accuracy 99.88%
Precision 0.9985
Recall 0.9998
F1 Score 0.9991
False Positive Rate 0.31%
False Negative Rate 0.02%

Training Data

  • Real Music: FMA Medium (25k), Proprietary (5k)
  • AI-Generated: SONICS dataset (49k), Proprietary (10k)

Usage

Python (ONNX)

import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("ai_music_detector.onnx")

# fakeprint = extract_fakeprint(audio_file)  # Your preprocessing
output = session.run(None, {"fakeprint": fakeprint.reshape(1, -1)})
ai_probability = output[0][0, 0]

print(f"AI Probability: {ai_probability:.1%}")

Python (Safetensors)

from safetensors.numpy import load_file
import numpy as np

weights = load_file("model.safetensors")
w = weights["weights"]  # Shape: (1, 3585)
b = weights["bias"]     # Shape: (1,)

# fakeprint = extract_fakeprint(audio_file)
logit = np.dot(fakeprint, w.T) + b
probability = 1 / (1 + np.exp(-logit))

Limitations

  • Sample Rate Dependent: Audio must be resampled to 16000 Hz
  • Minimum Duration: Works best with 10+ seconds of audio
  • Evolving Generators: Needs retraining on new generations of AI music generators

Acknowledgements

This implementation is based on the fakeprint detection method proposed by Afchar et al. [1], which demonstrates that neural vocoders in generative music models produce characteristic frequency-domain artifacts due to their deconvolution architecture.

References

[1] D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, "A Fourier Explanation of AI-music Artifacts," in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2025. Available: https://arxiv.org/abs/2506.19108

License

MIT License

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for lofcz/ai-music-detector