Initial upload: popularity prediction MLP head + evaluation report

Browse files

Files changed (3) hide show

README.md +127 -0
evaluation_report.html +0 -0
popularity_head.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+---
+license: cc-by-4.0
+tags:
+  - audio
+  - music
+  - whisper
+  - popularity-prediction
+  - laion
+  - laion-tunes
+library_name: transformers
+pipeline_tag: audio-classification
+---
+# Music Popularity Predictor
+Predicts **play count** and **upvote/like count** of AI-generated music tracks from audio alone.
+## Architecture
+| Component | Details |
+|-----------|---------|
+| **Encoder** | [laion/music-whisper](https://huggingface.co/laion/music-whisper) (Whisper Small fine-tuned for music captioning, frozen) |
+| **Pooling** | Encoder output (1500x768) → 10 segments of 150 frames → mean/max/min pool → 23,040-dim |
+| **MLP Head** | 23040 → 1024 → 256 (LayerNorm) → two prediction heads (play count + upvote count) |
+| **Output** | log1p-scaled: `log(1 + count)` — use `math.expm1()` to convert back |
+## Training
+- **Data**: ~39,000 stratified samples from the [LAION-Tunes](https://huggingface.co/datasets/ai-music/ai-music-deduplicated) dataset (Suno, Udio, Mureka, Riffusion, Sonauto)
+- **Loss**: Huber Loss
+- **Optimizer**: AdamW (lr=5e-4, weight_decay=1e-4, cosine schedule, 3 epochs)
+- **Best val loss**: 4.004 (epoch 2)
+### Evaluation (200 validation samples)
+| Metric | Play Count | Upvote Count |
+|--------|-----------|--------------|
+| Pearson r | 0.145 | 0.102 |
+| Log-Pearson r | 0.414 | 0.413 |
+| Log MAE | 2.981 | 1.923 |
+## Usage
+```python
+import torch
+import torch.nn as nn
+import numpy as np
+import librosa
+import math
+from transformers import WhisperProcessor, WhisperForConditionalGeneration
+from huggingface_hub import hf_hub_download
+# --- Define the MLP head ---
+class PopularityMLP(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.bottleneck = nn.Sequential(
+            nn.Linear(23040, 1024), nn.ReLU(), nn.Dropout(0.3),
+            nn.Linear(1024, 256), nn.ReLU(), nn.LayerNorm(256),
+        )
+        self.play_head = nn.Sequential(nn.Linear(256, 64), nn.ReLU(), nn.Linear(64, 1))
+        self.upvote_head = nn.Sequential(nn.Linear(256, 64), nn.ReLU(), nn.Linear(64, 1))
+    def forward(self, x):
+        feat = self.bottleneck(x)
+        return self.play_head(feat).squeeze(-1), self.upvote_head(feat).squeeze(-1)
+# --- Load models ---
+# Whisper encoder from laion/music-whisper
+processor = WhisperProcessor.from_pretrained("laion/music-whisper")
+whisper = WhisperForConditionalGeneration.from_pretrained(
+    "laion/music-whisper", torch_dtype=torch.float16
+).cuda().eval()
+encoder = whisper.get_encoder()
+# Popularity head from this repo
+head_path = hf_hub_download("laion/music-popularity", "popularity_head.pt")
+mlp = PopularityMLP().cuda()
+mlp.load_state_dict(torch.load(head_path, map_location="cuda")["mlp_state_dict"])
+mlp.eval()
+# --- Run inference ---
+audio, sr = librosa.load("song.mp3", sr=16000, mono=True)
+audio = audio[:30 * 16000]  # first 30 seconds
+if len(audio) < 30 * 16000:
+    audio = np.pad(audio, (0, 30 * 16000 - len(audio)))
+inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
+with torch.no_grad():
+    enc_out = encoder(inputs.input_features.cuda().half()).last_hidden_state  # (1, 1500, 768)
+    # Segment pooling: 10 segments, mean/max/min
+    segments = enc_out.view(1, 10, 150, 768)
+    pooled = torch.cat([segments.mean(2), segments.max(2).values, segments.min(2).values], dim=2)
+    pooled = pooled.view(1, -1).float()  # (1, 23040)
+    pred_play, pred_upvote = mlp(pooled)
+print(f"Estimated plays:   {math.expm1(pred_play.item()):,.0f}")
+print(f"Estimated upvotes: {math.expm1(pred_upvote.item()):,.0f}")
+```
+## Files
+| File | Description |
+|------|-------------|
+| `popularity_head.pt` | MLP head weights (91 MB) |
+| `evaluation_report.html` | Detailed evaluation with plots |
+The Whisper encoder is loaded separately from [laion/music-whisper](https://huggingface.co/laion/music-whisper).
+## License
+CC BY 4.0 — Christoph Schuhmann / LAION
+## Acknowledgments
+- Encoder: [laion/music-whisper](https://huggingface.co/laion/music-whisper) (OpenAI Whisper Small, fine-tuned for music captioning)
+- Dataset: [LAION-Tunes](https://huggingface.co/datasets/ai-music/ai-music-deduplicated) (AI-generated music from Suno, Udio, Mureka, Riffusion, Sonauto)
+- Developed by Christoph Schuhmann and the LAION community

evaluation_report.html ADDED Viewed

The diff for this file is too large to render. See raw diff

popularity_head.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b7d751eae3f7625257708b7163534cce28c48e6db9453a62fab467b4b6729af
+size 95565585