22ds2000101's picture
Update README.md
8823b4c verified
---
title: Messy Mashup Genre Classifier
emoji: 🎡
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: app.py
pinned: false
---
# Messy Mashup Genre Classifier
This Space runs inference for a fine-tuned **Audio Spectrogram Transformer (AST)** model on your music-genre classification task.
Your training code uses:
- **ASTForAudioClassification**
- **ASTFeatureExtractor**
- **16 kHz audio**
- **10-second segments**
- **10 labels**: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock
## Files in this Space
- `app.py` β€” Streamlit app for inference
- `requirements.txt` β€” Python dependencies for Hugging Face Spaces
- `README.md` β€” Space metadata and setup instructions
## Important: convert your trained checkpoint before deployment
Your notebook currently saves only:
```python
torch.save(model.state_dict(), "best_ast_model.pt")
```
For a Hugging Face Space, it is much better to upload the model in **Hugging Face format**.
Run this once after training:
```python
import torch
from transformers import ASTFeatureExtractor, ASTForAudioClassification
MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
NUM_LABELS = 10
id2label = {
0: "blues",
1: "classical",
2: "country",
3: "disco",
4: "hiphop",
5: "jazz",
6: "metal",
7: "pop",
8: "reggae",
9: "rock",
}
label2id = {v: k for k, v in id2label.items()}
model = ASTForAudioClassification.from_pretrained(
MODEL_NAME,
num_labels=NUM_LABELS,
id2label=id2label,
label2id=label2id,
ignore_mismatched_sizes=True,
)
state_dict = torch.load("best_ast_model.pt", map_location="cpu")
model.load_state_dict(state_dict)
feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)
save_dir = "ast-messy-mashup-model"
model.save_pretrained(save_dir)
feature_extractor.save_pretrained(save_dir)
```
This will create files such as:
- `config.json`
- `model.safetensors` or `pytorch_model.bin`
- `preprocessor_config.json`
## Upload the model to the Hugging Face Hub
### Option 1: from Python
```python
from huggingface_hub import login
from transformers import ASTFeatureExtractor, ASTForAudioClassification
login()
model.push_to_hub("your-username/your-model-repo")
feature_extractor.push_to_hub("your-username/your-model-repo")
```
### Option 2: using git
Create a new **model repository** on Hugging Face and upload the saved model files into it.
## Create the Streamlit Space
1. Go to Hugging Face and create a new **Space**.
2. Choose **Streamlit** as the SDK.
3. Upload these three files:
- `app.py`
- `requirements.txt`
- `README.md`
4. In the Space settings, add an environment variable:
```text
MODEL_REPO=your-username/your-model-repo
```
You can also hardcode the repo name directly inside `app.py`.
## How the app works
- Loads your model from `MODEL_REPO`
- Reads uploaded audio
- Resamples to **16 kHz mono**
- Uses **10-second windows**
- For long audio, takes up to **3 evenly spaced segments**
- Averages probabilities across segments
- Displays the predicted genre and top class scores
## Recommended repository structure
### Space repo
```text
.
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
└── README.md
```
### Model repo
```text
.
β”œβ”€β”€ config.json
β”œβ”€β”€ preprocessor_config.json
β”œβ”€β”€ model.safetensors
└── README.md
```
## Notes
- This app is set up for inference only, not training.
- `wandb`, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.
- If you want, you can later add:
- waveform display
- spectrogram preview
- batch prediction
- confidence chart
- example audio samples
## Local run
```bash
pip install -r requirements.txt
streamlit run app.py
```