areffarhadi
/

AST-mns

Audio Classification

Model card Files Files and versions

areffarhadi commited on Mar 5

Commit

a12fcb2

·

verified ·

1 Parent(s): ce786af

Create README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+license: mit
+tags:
+- audio
+- audio-classification
+- speech
+- music
+- pytorch
+- ast
+- VAD
+library_name: pytorch
+pipeline_tag: audio-classification
+---
+# Audio Classification with AST (Music / Non-Speech / Speech)
+This model is a **fine-tuned Audio Spectrogram Transformer (AST)** for **audio classification**.
+It classifies audio clips into three categories:
+- **Speech**
+- **Music**
+- **Non-Speech**
+The model operates on **log-Mel filterbank features extracted from 16 kHz audio** and uses a **Transformer-based architecture** adapted for audio spectrograms.
+Training and source code are available here:
+**GitHub repository:**
+https://github.com/areffarhadi/audio-classification/tree/main/AST-model
+---
+# Model Details
+## Architecture
+The model is based on **Audio Spectrogram Transformer (AST)**, which applies the Vision Transformer architecture to audio spectrograms.
+Key characteristics:
+- Transformer encoder architecture
+- Patch-based spectrogram representation
+- Learned positional embeddings
+- Classification token and distillation token
+- Final classification head fine-tuned for the target classes
+### Input
+- **Audio format:** WAV
+- **Sampling rate:** 16 kHz
+- **Features:** Log-Mel filterbank
+- **Mel bins:** 128
+- **Target length:** 1024 frames
+### Output Classes
+| Index | Label |
+|------|------|
+| 0 | Music |
+| 1 | Non-Speech |
+| 2 | Speech |
+---
+# Usage
+[**Inference code**](https://github.com/areffarhadi/audio-classification/blob/main/AST-model/ast_inference_with_manifest.py)