| --- |
| title: Messy Mashup Genre Classifier |
| emoji: π΅ |
| colorFrom: blue |
| colorTo: purple |
| sdk: streamlit |
| sdk_version: 1.25.0 |
| app_file: app.py |
| pinned: false |
| --- |
| |
| # Messy Mashup Genre Classifier |
|
|
| This Space runs inference for a fine-tuned **Audio Spectrogram Transformer (AST)** model on your music-genre classification task. |
|
|
| Your training code uses: |
| - **ASTForAudioClassification** |
| - **ASTFeatureExtractor** |
| - **16 kHz audio** |
| - **10-second segments** |
| - **10 labels**: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock |
|
|
| ## Files in this Space |
|
|
| - `app.py` β Streamlit app for inference |
| - `requirements.txt` β Python dependencies for Hugging Face Spaces |
| - `README.md` β Space metadata and setup instructions |
|
|
| ## Important: convert your trained checkpoint before deployment |
|
|
| Your notebook currently saves only: |
|
|
| ```python |
| torch.save(model.state_dict(), "best_ast_model.pt") |
| ``` |
|
|
| For a Hugging Face Space, it is much better to upload the model in **Hugging Face format**. |
|
|
| Run this once after training: |
|
|
| ```python |
| import torch |
| from transformers import ASTFeatureExtractor, ASTForAudioClassification |
| |
| MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593" |
| NUM_LABELS = 10 |
| |
| id2label = { |
| 0: "blues", |
| 1: "classical", |
| 2: "country", |
| 3: "disco", |
| 4: "hiphop", |
| 5: "jazz", |
| 6: "metal", |
| 7: "pop", |
| 8: "reggae", |
| 9: "rock", |
| } |
| label2id = {v: k for k, v in id2label.items()} |
| |
| model = ASTForAudioClassification.from_pretrained( |
| MODEL_NAME, |
| num_labels=NUM_LABELS, |
| id2label=id2label, |
| label2id=label2id, |
| ignore_mismatched_sizes=True, |
| ) |
| |
| state_dict = torch.load("best_ast_model.pt", map_location="cpu") |
| model.load_state_dict(state_dict) |
| |
| feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME) |
| |
| save_dir = "ast-messy-mashup-model" |
| model.save_pretrained(save_dir) |
| feature_extractor.save_pretrained(save_dir) |
| ``` |
|
|
| This will create files such as: |
| - `config.json` |
| - `model.safetensors` or `pytorch_model.bin` |
| - `preprocessor_config.json` |
|
|
| ## Upload the model to the Hugging Face Hub |
|
|
| ### Option 1: from Python |
|
|
| ```python |
| from huggingface_hub import login |
| from transformers import ASTFeatureExtractor, ASTForAudioClassification |
| |
| login() |
| |
| model.push_to_hub("your-username/your-model-repo") |
| feature_extractor.push_to_hub("your-username/your-model-repo") |
| ``` |
|
|
| ### Option 2: using git |
|
|
| Create a new **model repository** on Hugging Face and upload the saved model files into it. |
|
|
| ## Create the Streamlit Space |
|
|
| 1. Go to Hugging Face and create a new **Space**. |
| 2. Choose **Streamlit** as the SDK. |
| 3. Upload these three files: |
| - `app.py` |
| - `requirements.txt` |
| - `README.md` |
| 4. In the Space settings, add an environment variable: |
|
|
| ```text |
| MODEL_REPO=your-username/your-model-repo |
| ``` |
|
|
| You can also hardcode the repo name directly inside `app.py`. |
|
|
| ## How the app works |
|
|
| - Loads your model from `MODEL_REPO` |
| - Reads uploaded audio |
| - Resamples to **16 kHz mono** |
| - Uses **10-second windows** |
| - For long audio, takes up to **3 evenly spaced segments** |
| - Averages probabilities across segments |
| - Displays the predicted genre and top class scores |
|
|
| ## Recommended repository structure |
|
|
| ### Space repo |
|
|
| ```text |
| . |
| βββ app.py |
| βββ requirements.txt |
| βββ README.md |
| ``` |
|
|
| ### Model repo |
|
|
| ```text |
| . |
| βββ config.json |
| βββ preprocessor_config.json |
| βββ model.safetensors |
| βββ README.md |
| ``` |
|
|
| ## Notes |
|
|
| - This app is set up for inference only, not training. |
| - `wandb`, Kaggle paths, dataset loaders, and augmentation code are intentionally left out. |
| - If you want, you can later add: |
| - waveform display |
| - spectrogram preview |
| - batch prediction |
| - confidence chart |
| - example audio samples |
|
|
| ## Local run |
|
|
| ```bash |
| pip install -r requirements.txt |
| streamlit run app.py |
| ``` |
|
|