Spaces:

22ds2000101
/

Messy_Music_Genre_Classifier

Paused

App Files Files Community

Messy_Music_Genre_Classifier / README.md

22ds2000101

Update README.md

8823b4c verified about 2 months ago

preview code

raw

history blame contribute delete

3.76 kB

	---
	title: Messy Mashup Genre Classifier
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.25.0
	app_file: app.py
	pinned: false
	---

	# Messy Mashup Genre Classifier

	This Space runs inference for a fine-tuned Audio Spectrogram Transformer (AST) model on your music-genre classification task.

	Your training code uses:
	- ASTForAudioClassification
	- ASTFeatureExtractor
	- 16 kHz audio
	- 10-second segments
	- 10 labels: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock

	## Files in this Space

	- `app.py` — Streamlit app for inference
	- `requirements.txt` — Python dependencies for Hugging Face Spaces
	- `README.md` — Space metadata and setup instructions

	## Important: convert your trained checkpoint before deployment

	Your notebook currently saves only:

	```python
	torch.save(model.state_dict(), "best_ast_model.pt")
	```

	For a Hugging Face Space, it is much better to upload the model in Hugging Face format.

	Run this once after training:

	```python
	import torch
	from transformers import ASTFeatureExtractor, ASTForAudioClassification

	MODEL_NAME = "MIT/ast-finetuned-audioset-10-10-0.4593"
	NUM_LABELS = 10

	id2label = {
	0: "blues",
	1: "classical",
	2: "country",
	3: "disco",
	4: "hiphop",
	5: "jazz",
	6: "metal",
	7: "pop",
	8: "reggae",
	9: "rock",
	}
	label2id = {v: k for k, v in id2label.items()}

	model = ASTForAudioClassification.from_pretrained(
	MODEL_NAME,
	num_labels=NUM_LABELS,
	id2label=id2label,
	label2id=label2id,
	ignore_mismatched_sizes=True,
	)

	state_dict = torch.load("best_ast_model.pt", map_location="cpu")
	model.load_state_dict(state_dict)

	feature_extractor = ASTFeatureExtractor.from_pretrained(MODEL_NAME)

	save_dir = "ast-messy-mashup-model"
	model.save_pretrained(save_dir)
	feature_extractor.save_pretrained(save_dir)
	```

	This will create files such as:
	- `config.json`
	- `model.safetensors` or `pytorch_model.bin`
	- `preprocessor_config.json`

	## Upload the model to the Hugging Face Hub

	### Option 1: from Python

	```python
	from huggingface_hub import login
	from transformers import ASTFeatureExtractor, ASTForAudioClassification

	login()

	model.push_to_hub("your-username/your-model-repo")
	feature_extractor.push_to_hub("your-username/your-model-repo")
	```

	### Option 2: using git

	Create a new model repository on Hugging Face and upload the saved model files into it.

	## Create the Streamlit Space

	1. Go to Hugging Face and create a new Space.
	2. Choose Streamlit as the SDK.
	3. Upload these three files:
	- `app.py`
	- `requirements.txt`
	- `README.md`
	4. In the Space settings, add an environment variable:

	```text
	MODEL_REPO=your-username/your-model-repo
	```

	You can also hardcode the repo name directly inside `app.py`.

	## How the app works

	- Loads your model from `MODEL_REPO`
	- Reads uploaded audio
	- Resamples to 16 kHz mono
	- Uses 10-second windows
	- For long audio, takes up to 3 evenly spaced segments
	- Averages probabilities across segments
	- Displays the predicted genre and top class scores

	## Recommended repository structure

	### Space repo

	```text
	.
	├── app.py
	├── requirements.txt
	└── README.md
	```

	### Model repo

	```text
	.
	├── config.json
	├── preprocessor_config.json
	├── model.safetensors
	└── README.md
	```

	## Notes

	- This app is set up for inference only, not training.
	- `wandb`, Kaggle paths, dataset loaders, and augmentation code are intentionally left out.
	- If you want, you can later add:
	- waveform display
	- spectrogram preview
	- batch prediction
	- confidence chart
	- example audio samples

	## Local run

	```bash
	pip install -r requirements.txt
	streamlit run app.py
	```