ASTMessyMashup / README.md
rudranshmathur's picture
Update README.md
82db288 verified
---
library_name: transformers
tags:
- music
license: mit
language:
- en
base_model:
- MIT/ast-finetuned-audioset-10-10-0.4593
pipeline_tag: audio-classification
---
# AST Audio Classification Model (Messy Mashup)
## Introduction
This model is a fine-tuned **Audio Spectrogram Transformer (AST)** designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories.
## Model Description
- **Developed by:** Rudransh Mathur
- **Institution:** Indian Institute of Technology, Madras
- **Model type:** Transformer-based Audio Classification Model
- **Base model:** AST (fine-tuned on AudioSet)
- **Framework:** Transformers (Hugging Face) + PyTorch
- **License:** MIT
This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data.
## Model Sources
- **Repository:** https://github.com/rudransmathur/dl-genai-project-26-t1
- **Kaggle Competetion:** https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project
## Intended Use
- Audio classification tasks
- Music/audio tagging
- Experimental research in audio transformers
## Training Details
- **Dataset:** Messy Mashup Audio Dataset
- GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock]
- STEMS = [drums, vocals, bass, other]
- **Epochs:** 10
- **Optimizer:** AdamW
- **Loss Function:** Cross-Entropy Loss
- **Scheduler:** Cosine Sheduler with warmup steps
### Preprocessing
- Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset
- Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file.
- Audio inputs converted using AST feature extractor
- Sampling rate aligned with model requirements
### Performance
- **Best Validation Accuracy:** 0.87
- **Best Validation Loss:** 0.40373
- **Best Test Accuracy:** 0.92
- **Best Validation Loss:** 0.3458
## 🚀 Usage
```python
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup")
feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup")
```