Spaces:

22ds2000101
/

Messy_Music_Genre_Classifier

Paused

App Files Files Community

Messy_Music_Genre_Classifier / README.md

22ds2000101

Update README.md

8823b4c verified about 2 months ago

preview code

raw

history blame contribute delete

3.76 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

metadata

title: Messy Mashup Genre Classifier
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false

Messy Mashup Genre Classifier

This Space runs inference for a fine-tuned Audio Spectrogram Transformer (AST) model on your music-genre classification task.

Your training code uses:

ASTForAudioClassification
ASTFeatureExtractor
16 kHz audio
10-second segments
10 labels: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock

Files in this Space

app.py — Streamlit app for inference
requirements.txt — Python dependencies for Hugging Face Spaces
README.md — Space metadata and setup instructions

Important: convert your trained checkpoint before deployment

Your notebook currently saves only:

torch.save(model.state_dict(), "best_ast_model.pt")

For a Hugging Face Space, it is much better to upload the model in Hugging Face format.

Run this once after training:

import torch
from transformers import ASTFeatureExtractor, ASTForAudioClassification

MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
NUM_LABELS = 10

id2label = {
    0: "blues",
    1: "classical",
    2: "country",
    3: "disco",
    4: "hiphop",
    5: "jazz",
    6: "metal",
    7: "pop",
    8: "reggae",
    9: "rock",
}
label2id = {v: k for k, v in id2label.items()}

model = ASTForAudioClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

state_dict = torch.load("best_ast_model.pt", map_location="cpu")
model.load_state_dict(state_dict)

feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)

save_dir = "ast-messy-mashup-model"
model.save_pretrained(save_dir)
feature_extractor.save_pretrained(save_dir)

This will create files such as:

config.json
model.safetensors or pytorch_model.bin
preprocessor_config.json

Upload the model to the Hugging Face Hub

Option 1: from Python

from huggingface_hub import login
from transformers import ASTFeatureExtractor, ASTForAudioClassification

login()

model.push_to_hub("your-username/your-model-repo")
feature_extractor.push_to_hub("your-username/your-model-repo")

Option 2: using git

Create a new model repository on Hugging Face and upload the saved model files into it.

Create the Streamlit Space

Go to Hugging Face and create a new Space.
Choose Streamlit as the SDK.
Upload these three files:
- app.py
- requirements.txt
- README.md
In the Space settings, add an environment variable:

MODEL_REPO=your-username/your-model-repo

You can also hardcode the repo name directly inside app.py.

How the app works

Loads your model from MODEL_REPO
Reads uploaded audio
Resamples to 16 kHz mono
Uses 10-second windows
For long audio, takes up to 3 evenly spaced segments
Averages probabilities across segments
Displays the predicted genre and top class scores

Recommended repository structure

Space repo

.
├── app.py
├── requirements.txt
└── README.md

Model repo

.
├── config.json
├── preprocessor_config.json
├── model.safetensors
└── README.md

Notes

This app is set up for inference only, not training.
wandb, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.
If you want, you can later add:
- waveform display
- spectrogram preview
- batch prediction
- confidence chart
- example audio samples

Local run

pip install -r requirements.txt
streamlit run app.py