AST Audio Classification Model (Messy Mashup)

Introduction

This model is a fine-tuned Audio Spectrogram Transformer (AST) designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories.

Model Description

  • Developed by: Rudransh Mathur
  • Institution: Indian Institute of Technology, Madras
  • Model type: Transformer-based Audio Classification Model
  • Base model: AST (fine-tuned on AudioSet)
  • Framework: Transformers (Hugging Face) + PyTorch
  • License: MIT

This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data.

Model Sources

Intended Use

  • Audio classification tasks
  • Music/audio tagging
  • Experimental research in audio transformers

Training Details

  • Dataset: Messy Mashup Audio Dataset
    • GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock]
    • STEMS = [drums, vocals, bass, other]
  • Epochs: 10
  • Optimizer: AdamW
  • Loss Function: Cross-Entropy Loss
  • Scheduler: Cosine Sheduler with warmup steps

Preprocessing

  • Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset
  • Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file.
  • Audio inputs converted using AST feature extractor
  • Sampling rate aligned with model requirements

Performance

  • Best Validation Accuracy: 0.87
  • Best Validation Loss: 0.40373
  • Best Test Accuracy: 0.92
  • Best Validation Loss: 0.3458

πŸš€ Usage

from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup")
feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup")
Downloads last month
52
Safetensors
Model size
86.2M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rudranshmathur/ASTMessyMashup

Finetuned
(171)
this model