22ds2000101's picture
Update README.md
8823b4c verified

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: Messy Mashup Genre Classifier
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false

Messy Mashup Genre Classifier

This Space runs inference for a fine-tuned Audio Spectrogram Transformer (AST) model on your music-genre classification task.

Your training code uses:

  • ASTForAudioClassification
  • ASTFeatureExtractor
  • 16 kHz audio
  • 10-second segments
  • 10 labels: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock

Files in this Space

  • app.py β€” Streamlit app for inference
  • requirements.txt β€” Python dependencies for Hugging Face Spaces
  • README.md β€” Space metadata and setup instructions

Important: convert your trained checkpoint before deployment

Your notebook currently saves only:

torch.save(model.state_dict(), "best_ast_model.pt")

For a Hugging Face Space, it is much better to upload the model in Hugging Face format.

Run this once after training:

import torch
from transformers import ASTFeatureExtractor, ASTForAudioClassification

MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
NUM_LABELS = 10

id2label = {
    0: "blues",
    1: "classical",
    2: "country",
    3: "disco",
    4: "hiphop",
    5: "jazz",
    6: "metal",
    7: "pop",
    8: "reggae",
    9: "rock",
}
label2id = {v: k for k, v in id2label.items()}

model = ASTForAudioClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

state_dict = torch.load("best_ast_model.pt", map_location="cpu")
model.load_state_dict(state_dict)

feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)

save_dir = "ast-messy-mashup-model"
model.save_pretrained(save_dir)
feature_extractor.save_pretrained(save_dir)

This will create files such as:

  • config.json
  • model.safetensors or pytorch_model.bin
  • preprocessor_config.json

Upload the model to the Hugging Face Hub

Option 1: from Python

from huggingface_hub import login
from transformers import ASTFeatureExtractor, ASTForAudioClassification

login()

model.push_to_hub("your-username/your-model-repo")
feature_extractor.push_to_hub("your-username/your-model-repo")

Option 2: using git

Create a new model repository on Hugging Face and upload the saved model files into it.

Create the Streamlit Space

  1. Go to Hugging Face and create a new Space.
  2. Choose Streamlit as the SDK.
  3. Upload these three files:
    • app.py
    • requirements.txt
    • README.md
  4. In the Space settings, add an environment variable:
MODEL_REPO=your-username/your-model-repo

You can also hardcode the repo name directly inside app.py.

How the app works

  • Loads your model from MODEL_REPO
  • Reads uploaded audio
  • Resamples to 16 kHz mono
  • Uses 10-second windows
  • For long audio, takes up to 3 evenly spaced segments
  • Averages probabilities across segments
  • Displays the predicted genre and top class scores

Recommended repository structure

Space repo

.
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
└── README.md

Model repo

.
β”œβ”€β”€ config.json
β”œβ”€β”€ preprocessor_config.json
β”œβ”€β”€ model.safetensors
└── README.md

Notes

  • This app is set up for inference only, not training.
  • wandb, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.
  • If you want, you can later add:
    • waveform display
    • spectrogram preview
    • batch prediction
    • confidence chart
    • example audio samples

Local run

pip install -r requirements.txt
streamlit run app.py