rudranshmathur
/

ASTMessyMashup

Audio Classification

audio-spectrogram-transformer

Model card Files Files and versions

ASTMessyMashup / README.md

rudranshmathur's picture

Update README.md

82db288 verified 3 days ago

|

history blame contribute delete

2.35 kB

	---
	library_name: transformers
	tags:
	- music
	license: mit
	language:
	- en
	base_model:
	- MIT/ast-finetuned-audioset-10-10-0.4593
	pipeline_tag: audio-classification
	---

	# AST Audio Classification Model (Messy Mashup)

	## Introduction
	This model is a fine-tuned Audio Spectrogram Transformer (AST) designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories.

	## Model Description

	- Developed by: Rudransh Mathur
	- Institution: Indian Institute of Technology, Madras
	- Model type: Transformer-based Audio Classification Model
	- Base model: AST (fine-tuned on AudioSet)
	- Framework: Transformers (Hugging Face) + PyTorch
	- License: MIT

	This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data.

	## Model Sources

	- Repository: https://github.com/rudransmathur/dl-genai-project-26-t1
	- Kaggle Competetion: https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project


	## Intended Use
	- Audio classification tasks
	- Music/audio tagging
	- Experimental research in audio transformers

	## Training Details

	- Dataset: Messy Mashup Audio Dataset
	- GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock]
	- STEMS = [drums, vocals, bass, other]
	- Epochs: 10
	- Optimizer: AdamW
	- Loss Function: Cross-Entropy Loss
	- Scheduler: Cosine Sheduler with warmup steps

	### Preprocessing
	- Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset
	- Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file.
	- Audio inputs converted using AST feature extractor
	- Sampling rate aligned with model requirements

	### Performance
	- Best Validation Accuracy: 0.87
	- Best Validation Loss: 0.40373
	- Best Test Accuracy: 0.92
	- Best Validation Loss: 0.3458

	## 🚀 Usage

	```python
	from transformers import AutoModelForAudioClassification, AutoFeatureExtractor

	model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup")
	feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup")
	```