areffarhadi commited on
Commit
a12fcb2
·
verified ·
1 Parent(s): ce786af

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - audio-classification
6
+ - speech
7
+ - music
8
+ - pytorch
9
+ - ast
10
+ - VAD
11
+ library_name: pytorch
12
+ pipeline_tag: audio-classification
13
+ ---
14
+
15
+ # Audio Classification with AST (Music / Non-Speech / Speech)
16
+
17
+ This model is a **fine-tuned Audio Spectrogram Transformer (AST)** for **audio classification**.
18
+ It classifies audio clips into three categories:
19
+
20
+ - **Speech**
21
+ - **Music**
22
+ - **Non-Speech**
23
+
24
+ The model operates on **log-Mel filterbank features extracted from 16 kHz audio** and uses a **Transformer-based architecture** adapted for audio spectrograms.
25
+
26
+ Training and source code are available here:
27
+
28
+ **GitHub repository:**
29
+ https://github.com/areffarhadi/audio-classification/tree/main/AST-model
30
+
31
+ ---
32
+
33
+ # Model Details
34
+
35
+ ## Architecture
36
+
37
+ The model is based on **Audio Spectrogram Transformer (AST)**, which applies the Vision Transformer architecture to audio spectrograms.
38
+
39
+ Key characteristics:
40
+
41
+ - Transformer encoder architecture
42
+ - Patch-based spectrogram representation
43
+ - Learned positional embeddings
44
+ - Classification token and distillation token
45
+ - Final classification head fine-tuned for the target classes
46
+
47
+ ### Input
48
+
49
+ - **Audio format:** WAV
50
+ - **Sampling rate:** 16 kHz
51
+ - **Features:** Log-Mel filterbank
52
+ - **Mel bins:** 128
53
+ - **Target length:** 1024 frames
54
+
55
+ ### Output Classes
56
+
57
+ | Index | Label |
58
+ |------|------|
59
+ | 0 | Music |
60
+ | 1 | Non-Speech |
61
+ | 2 | Speech |
62
+
63
+ ---
64
+
65
+ # Usage
66
+
67
+ [**Inference code**](https://github.com/areffarhadi/audio-classification/blob/main/AST-model/ast_inference_with_manifest.py)