--- library_name: transformers tags: - music license: mit language: - en base_model: - MIT/ast-finetuned-audioset-10-10-0.4593 pipeline_tag: audio-classification --- # AST Audio Classification Model (Messy Mashup) ## Introduction This model is a fine-tuned **Audio Spectrogram Transformer (AST)** designed for audio classification tasks on the Messy Mashup dataset. It leverages pretrained audio representations and adapts them to classify audio inputs into predefined categories. ## Model Description - **Developed by:** Rudransh Mathur - **Institution:** Indian Institute of Technology, Madras - **Model type:** Transformer-based Audio Classification Model - **Base model:** AST (fine-tuned on AudioSet) - **Framework:** Transformers (Hugging Face) + PyTorch - **License:** MIT This model builds upon the pretrained AST architecture and is fine-tuned for improved performance on domain-specific audio data. ## Model Sources - **Repository:** https://github.com/rudransmathur/dl-genai-project-26-t1 - **Kaggle Competetion:** https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project ## Intended Use - Audio classification tasks - Music/audio tagging - Experimental research in audio transformers ## Training Details - **Dataset:** Messy Mashup Audio Dataset - GENRES: [blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock] - STEMS = [drums, vocals, bass, other] - **Epochs:** 10 - **Optimizer:** AdamW - **Loss Function:** Cross-Entropy Loss - **Scheduler:** Cosine Sheduler with warmup steps ### Preprocessing - Randomly sampled audio stem files within the same genre and mixed to create a mixed song audio song similar to test dataset - Added 5 seconds of noise from the noise dataset 2-3 times on a random basis in the audio file. - Audio inputs converted using AST feature extractor - Sampling rate aligned with model requirements ### Performance - **Best Validation Accuracy:** 0.87 - **Best Validation Loss:** 0.40373 - **Best Test Accuracy:** 0.92 - **Best Validation Loss:** 0.3458 ## 🚀 Usage ```python from transformers import AutoModelForAudioClassification, AutoFeatureExtractor model = AutoModelForAudioClassification.from_pretrained("rudranshmathur/ASTMessyMashup") feature_extractor = AutoFeatureExtractor.from_pretrained("rudranshmathur/ASTMessyMashup") ```