---
title: Messy Mashup Genre Classifier
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false
---

# Messy Mashup Genre Classifier

This Space runs inference for a fine-tuned **Audio Spectrogram Transformer (AST)** model on your music-genre classification task.

Your training code uses:
- **ASTForAudioClassification**
- **ASTFeatureExtractor**
- **16 kHz audio**
- **10-second segments**
- **10 labels**: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock

## Files in this Space

- `app.py` — Streamlit app for inference
- `requirements.txt` — Python dependencies for Hugging Face Spaces
- `README.md` — Space metadata and setup instructions

## Important: convert your trained checkpoint before deployment

Your notebook currently saves only:

```python
torch.save(model.state_dict(), "best_ast_model.pt")
```

For a Hugging Face Space, it is much better to upload the model in **Hugging Face format**.

Run this once after training:

```python
import torch
from transformers import ASTFeatureExtractor, ASTForAudioClassification

MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
NUM_LABELS = 10

id2label = {
    0: "blues",
    1: "classical",
    2: "country",
    3: "disco",
    4: "hiphop",
    5: "jazz",
    6: "metal",
    7: "pop",
    8: "reggae",
    9: "rock",
}
label2id = {v: k for k, v in id2label.items()}

model = ASTForAudioClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

state_dict = torch.load("best_ast_model.pt", map_location="cpu")
model.load_state_dict(state_dict)

feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)

save_dir = "ast-messy-mashup-model"
model.save_pretrained(save_dir)
feature_extractor.save_pretrained(save_dir)
```

This will create files such as:
- `config.json`
- `model.safetensors` or `pytorch_model.bin`
- `preprocessor_config.json`

## Upload the model to the Hugging Face Hub

### Option 1: from Python

```python
from huggingface_hub import login
from transformers import ASTFeatureExtractor, ASTForAudioClassification

login()

model.push_to_hub("your-username/your-model-repo")
feature_extractor.push_to_hub("your-username/your-model-repo")
```

### Option 2: using git

Create a new **model repository** on Hugging Face and upload the saved model files into it.

## Create the Streamlit Space

1. Go to Hugging Face and create a new **Space**.
2. Choose **Streamlit** as the SDK.
3. Upload these three files:
   - `app.py`
   - `requirements.txt`
   - `README.md`
4. In the Space settings, add an environment variable:

```text
MODEL_REPO=your-username/your-model-repo
```

You can also hardcode the repo name directly inside `app.py`.

## How the app works

- Loads your model from `MODEL_REPO`
- Reads uploaded audio
- Resamples to **16 kHz mono**
- Uses **10-second windows**
- For long audio, takes up to **3 evenly spaced segments**
- Averages probabilities across segments
- Displays the predicted genre and top class scores

## Recommended repository structure

### Space repo

```text
.
├── app.py
├── requirements.txt
└── README.md
```

### Model repo

```text
.
├── config.json
├── preprocessor_config.json
├── model.safetensors
└── README.md
```

## Notes

- This app is set up for inference only, not training.
- `wandb`, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.
- If you want, you can later add:
  - waveform display
  - spectrogram preview
  - batch prediction
  - confidence chart
  - example audio samples

## Local run

```bash
pip install -r requirements.txt
streamlit run app.py
```