| --- |
| library_name: transformers |
| tags: |
| - music |
| license: mit |
| language: |
| - en |
| base_model: |
| - MIT/ast-finetuned-audioset-10-10-0.4593 |
| pipeline_tag: audio-classification |
| --- |
| |
| # AST Audio Classification Model (Messy Mashup) |
|
|
| ## Introduction |
| This model is a fine-tuned **Audio Spectrogram Transformer (AST)** designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories. |
|
|
| ## Model Description |
|
|
| - **Developed by:** Rudransh Mathur |
| - **Institution:** Indian Institute of Technology, Madras |
| - **Model type:** Transformer-based Audio Classification Model |
| - **Base model:** AST (fine-tuned on AudioSet) |
| - **Framework:** Transformers (Hugging Face) + PyTorch |
| - **License:** MIT |
|
|
| This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data. |
|
|
| ## Model Sources |
|
|
| - **Repository:** https://github.com/rudransmathur/dl-genai-project-26-t1 |
| - **Kaggle Competetion:** https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project |
|
|
|
|
| ## Intended Use |
| - Audio classification tasks |
| - Music/audio tagging |
| - Experimental research in audio transformers |
|
|
| ## Training Details |
|
|
| - **Dataset:** Messy Mashup Audio Dataset |
| - GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock] |
| - STEMS = [drums, vocals, bass, other] |
| - **Epochs:** 10 |
| - **Optimizer:** AdamW |
| - **Loss Function:** Cross-Entropy Loss |
| - **Scheduler:** Cosine Sheduler with warmup steps |
|
|
| ### Preprocessing |
| - Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset |
| - Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file. |
| - Audio inputs converted using AST feature extractor |
| - Sampling rate aligned with model requirements |
|
|
| ### Performance |
| - **Best Validation Accuracy:** 0.87 |
| - **Best Validation Loss:** 0.40373 |
| - **Best Test Accuracy:** 0.92 |
| - **Best Validation Loss:** 0.3458 |
|
|
| ## 🚀 Usage |
|
|
| ```python |
| from transformers import AutoModelForAudioClassification, AutoFeatureExtractor |
| |
| model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup") |
| feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup") |
| ``` |