--- title: Messy Mashup Genre Classifier emoji: 🎵 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.25.0 app_file: app.py pinned: false --- # Messy Mashup Genre Classifier This Space runs inference for a fine-tuned **Audio Spectrogram Transformer (AST)** model on your music-genre classification task. Your training code uses: - **ASTForAudioClassification** - **ASTFeatureExtractor** - **16 kHz audio** - **10-second segments** - **10 labels**: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock ## Files in this Space - `app.py` — Streamlit app for inference - `requirements.txt` — Python dependencies for Hugging Face Spaces - `README.md` — Space metadata and setup instructions ## Important: convert your trained checkpoint before deployment Your notebook currently saves only: ```python torch.save(model.state_dict(), "best_ast_model.pt") ``` For a Hugging Face Space, it is much better to upload the model in **Hugging Face format**. Run this once after training: ```python import torch from transformers import ASTFeatureExtractor, ASTForAudioClassification MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593" NUM_LABELS = 10 id2label = { 0: "blues", 1: "classical", 2: "country", 3: "disco", 4: "hiphop", 5: "jazz", 6: "metal", 7: "pop", 8: "reggae", 9: "rock", } label2id = {v: k for k, v in id2label.items()} model = ASTForAudioClassification.from_pretrained( MODEL_NAME, num_labels=NUM_LABELS, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True, ) state_dict = torch.load("best_ast_model.pt", map_location="cpu") model.load_state_dict(state_dict) feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME) save_dir = "ast-messy-mashup-model" model.save_pretrained(save_dir) feature_extractor.save_pretrained(save_dir) ``` This will create files such as: - `config.json` - `model.safetensors` or `pytorch_model.bin` - `preprocessor_config.json` ## Upload the model to the Hugging Face Hub ### Option 1: from Python ```python from huggingface_hub import login from transformers import ASTFeatureExtractor, ASTForAudioClassification login() model.push_to_hub("your-username/your-model-repo") feature_extractor.push_to_hub("your-username/your-model-repo") ``` ### Option 2: using git Create a new **model repository** on Hugging Face and upload the saved model files into it. ## Create the Streamlit Space 1. Go to Hugging Face and create a new **Space**. 2. Choose **Streamlit** as the SDK. 3. Upload these three files: - `app.py` - `requirements.txt` - `README.md` 4. In the Space settings, add an environment variable: ```text MODEL_REPO=your-username/your-model-repo ``` You can also hardcode the repo name directly inside `app.py`. ## How the app works - Loads your model from `MODEL_REPO` - Reads uploaded audio - Resamples to **16 kHz mono** - Uses **10-second windows** - For long audio, takes up to **3 evenly spaced segments** - Averages probabilities across segments - Displays the predicted genre and top class scores ## Recommended repository structure ### Space repo ```text . ├── app.py ├── requirements.txt └── README.md ``` ### Model repo ```text . ├── config.json ├── preprocessor_config.json ├── model.safetensors └── README.md ``` ## Notes - This app is set up for inference only, not training. - `wandb`, Kaggle paths, dataset loaders, and augmentation code are intentionally left out. - If you want, you can later add: - waveform display - spectrogram preview - batch prediction - confidence chart - example audio samples ## Local run ```bash pip install -r requirements.txt streamlit run app.py ```