File size: 2,345 Bytes
abaee17 f3c11ba abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 abaee17 82db288 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | ---
library_name: transformers
tags:
- music
license: mit
language:
- en
base_model:
- MIT/ast-finetuned-audioset-10-10-0.4593
pipeline_tag: audio-classification
---
# AST Audio Classification Model (Messy Mashup)
## Introduction
This model is a fine-tuned **Audio Spectrogram Transformer (AST)** designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories.
## Model Description
- **Developed by:** Rudransh Mathur
- **Institution:** Indian Institute of Technology, Madras
- **Model type:** Transformer-based Audio Classification Model
- **Base model:** AST (fine-tuned on AudioSet)
- **Framework:** Transformers (Hugging Face) + PyTorch
- **License:** MIT
This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data.
## Model Sources
- **Repository:** https://github.com/rudransmathur/dl-genai-project-26-t1
- **Kaggle Competetion:** https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project
## Intended Use
- Audio classification tasks
- Music/audio tagging
- Experimental research in audio transformers
## Training Details
- **Dataset:** Messy Mashup Audio Dataset
- GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock]
- STEMS = [drums, vocals, bass, other]
- **Epochs:** 10
- **Optimizer:** AdamW
- **Loss Function:** Cross-Entropy Loss
- **Scheduler:** Cosine Sheduler with warmup steps
### Preprocessing
- Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset
- Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file.
- Audio inputs converted using AST feature extractor
- Sampling rate aligned with model requirements
### Performance
- **Best Validation Accuracy:** 0.87
- **Best Validation Loss:** 0.40373
- **Best Test Accuracy:** 0.92
- **Best Validation Loss:** 0.3458
## 🚀 Usage
```python
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup")
feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup")
``` |