A newer version of the Streamlit SDK is available: 1.58.0
metadata
title: Messy Mashup Genre Classifier
emoji: π΅
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false
Messy Mashup Genre Classifier
This Space runs inference for a fine-tuned Audio Spectrogram Transformer (AST) model on your music-genre classification task.
Your training code uses:
- ASTForAudioClassification
- ASTFeatureExtractor
- 16 kHz audio
- 10-second segments
- 10 labels: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock
Files in this Space
app.pyβ Streamlit app for inferencerequirements.txtβ Python dependencies for Hugging Face SpacesREADME.mdβ Space metadata and setup instructions
Important: convert your trained checkpoint before deployment
Your notebook currently saves only:
torch.save(model.state_dict(), "best_ast_model.pt")
For a Hugging Face Space, it is much better to upload the model in Hugging Face format.
Run this once after training:
import torch
from transformers import ASTFeatureExtractor, ASTForAudioClassification
MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
NUM_LABELS = 10
id2label = {
0: "blues",
1: "classical",
2: "country",
3: "disco",
4: "hiphop",
5: "jazz",
6: "metal",
7: "pop",
8: "reggae",
9: "rock",
}
label2id = {v: k for k, v in id2label.items()}
model = ASTForAudioClassification.from_pretrained(
MODEL_NAME,
num_labels=NUM_LABELS,
id2label=id2label,
label2id=label2id,
ignore_mismatched_sizes=True,
)
state_dict = torch.load("best_ast_model.pt", map_location="cpu")
model.load_state_dict(state_dict)
feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)
save_dir = "ast-messy-mashup-model"
model.save_pretrained(save_dir)
feature_extractor.save_pretrained(save_dir)
This will create files such as:
config.jsonmodel.safetensorsorpytorch_model.binpreprocessor_config.json
Upload the model to the Hugging Face Hub
Option 1: from Python
from huggingface_hub import login
from transformers import ASTFeatureExtractor, ASTForAudioClassification
login()
model.push_to_hub("your-username/your-model-repo")
feature_extractor.push_to_hub("your-username/your-model-repo")
Option 2: using git
Create a new model repository on Hugging Face and upload the saved model files into it.
Create the Streamlit Space
- Go to Hugging Face and create a new Space.
- Choose Streamlit as the SDK.
- Upload these three files:
app.pyrequirements.txtREADME.md
- In the Space settings, add an environment variable:
MODEL_REPO=your-username/your-model-repo
You can also hardcode the repo name directly inside app.py.
How the app works
- Loads your model from
MODEL_REPO - Reads uploaded audio
- Resamples to 16 kHz mono
- Uses 10-second windows
- For long audio, takes up to 3 evenly spaced segments
- Averages probabilities across segments
- Displays the predicted genre and top class scores
Recommended repository structure
Space repo
.
βββ app.py
βββ requirements.txt
βββ README.md
Model repo
.
βββ config.json
βββ preprocessor_config.json
βββ model.safetensors
βββ README.md
Notes
- This app is set up for inference only, not training.
wandb, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.- If you want, you can later add:
- waveform display
- spectrogram preview
- batch prediction
- confidence chart
- example audio samples
Local run
pip install -r requirements.txt
streamlit run app.py