AST Audio Classification Model (Messy Mashup)

Introduction

This model is a fine-tuned Audio Spectrogram Transformer (AST) designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories.

Model Description

Developed by: Rudransh Mathur
Institution: Indian Institute of Technology, Madras
Model type: Transformer-based Audio Classification Model
Base model: AST (fine-tuned on AudioSet)
Framework: Transformers (Hugging Face) + PyTorch
License: MIT

This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data.

Model Sources

Repository: https://github.com/rudransmathur/dl-genai-project-26-t1
Kaggle Competetion: https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project

Intended Use

Audio classification tasks
Music/audio tagging
Experimental research in audio transformers

Training Details

Dataset: Messy Mashup Audio Dataset
- GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock]
- STEMS = [drums, vocals, bass, other]
Epochs: 10
Optimizer: AdamW
Loss Function: Cross-Entropy Loss
Scheduler: Cosine Sheduler with warmup steps

Preprocessing

Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset
Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file.
Audio inputs converted using AST feature extractor
Sampling rate aligned with model requirements

Performance

Best Validation Accuracy: 0.87
Best Validation Loss: 0.40373
Best Test Accuracy: 0.92
Best Validation Loss: 0.3458

🚀 Usage

from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup")
feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup")

Downloads last month: 52

Safetensors

Model size

86.2M params

Tensor type

F32

Model tree for rudranshmathur/ASTMessyMashup

Base model

MIT/ast-finetuned-audioset-10-10-0.4593

Finetuned

(171)

this model