Aynursusuz commited on
Commit
927ea8b
Β·
verified Β·
1 Parent(s): c715b5b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +157 -0
README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model: MIT/ast-finetuned-audioset-10-10-0.4593
6
+ tags:
7
+ - audio-classification
8
+ - music
9
+ - speech
10
+ - ast
11
+ - audio-spectrogram-transformer
12
+ pipeline_tag: audio-classification
13
+ datasets:
14
+ - AIGenLab/speech-music-82k
15
+ metrics:
16
+ - accuracy
17
+ library_name: transformers
18
+ ---
19
+
20
+ # 🎡 AST Music vs Speech Classifier (82K)
21
+
22
+ Fine-tuned Audio Spectrogram Transformer (AST) for music vs speech classification.
23
+
24
+ ## Model Details
25
+
26
+ - **Base Model:** MIT/ast-finetuned-audioset-10-10-0.4593
27
+ - **Task:** Binary Audio Classification (Music vs Speech)
28
+ - **Training Dataset:** AIGenLab/speech-music-82k (82000 samples)
29
+ - **Overall Accuracy:** 86.7% (26/30)
30
+
31
+ ---
32
+
33
+ ## πŸ“Š Performance Results
34
+
35
+ | Category | Accuracy | Correct | Total |
36
+ |----------|----------|---------|-------|
37
+ | Pure Music | 100.0% | 10 | 10 |
38
+ | Pure Speech | 60.0% | 6 | 10 |
39
+ | Speech + Music | 100.0% | 10 | 10 |
40
+
41
+ ### Pure Music
42
+
43
+ | File | Music Score | Speech Score | Prediction | Result |
44
+ |------|-------------|--------------|------------|--------|
45
+ | music_1.wav | 1.000 | 0.000 | MUSIC | βœ… |
46
+ | music_10.wav | 1.000 | 0.000 | MUSIC | βœ… |
47
+ | music_2.wav | 1.000 | 0.000 | MUSIC | βœ… |
48
+ | music_3.wav | 1.000 | 0.000 | MUSIC | βœ… |
49
+ | music_4.wav | 1.000 | 0.000 | MUSIC | βœ… |
50
+ | music_5.wav | 1.000 | 0.000 | MUSIC | βœ… |
51
+ | music_6.wav | 1.000 | 0.000 | MUSIC | βœ… |
52
+ | music_7.wav | 1.000 | 0.000 | MUSIC | βœ… |
53
+ | music_8.wav | 1.000 | 0.000 | MUSIC | βœ… |
54
+ | music_9.wav | 1.000 | 0.000 | MUSIC | βœ… |
55
+
56
+ ### Pure Speech
57
+
58
+ | File | Music Score | Speech Score | Prediction | Result |
59
+ |------|-------------|--------------|------------|--------|
60
+ | speech_1.wav | 0.051 | 0.949 | SPEECH | βœ… |
61
+ | speech_10.wav | 0.039 | 0.961 | SPEECH | βœ… |
62
+ | speech_2.wav | 0.000 | 1.000 | SPEECH | βœ… |
63
+ | speech_3.wav | 0.372 | 0.628 | SPEECH | βœ… |
64
+ | speech_4.wav | 1.000 | 0.000 | MUSIC | ❌ |
65
+ | speech_5.wav | 0.997 | 0.003 | MUSIC | ❌ |
66
+ | speech_6.wav | 1.000 | 0.000 | MUSIC | ❌ |
67
+ | speech_7.wav | 0.170 | 0.830 | SPEECH | βœ… |
68
+ | speech_8.wav | 0.870 | 0.130 | MUSIC | ❌ |
69
+ | speech_9.wav | 0.035 | 0.965 | SPEECH | βœ… |
70
+
71
+ ### Speech + Music
72
+
73
+ | File | Music Score | Speech Score | Prediction | Result |
74
+ |------|-------------|--------------|------------|--------|
75
+ | speech_and_music_1.wav | 1.000 | 0.000 | MUSIC | βœ… |
76
+ | speech_and_music_10.wav | 0.987 | 0.013 | MUSIC | βœ… |
77
+ | speech_and_music_2.wav | 1.000 | 0.000 | MUSIC | βœ… |
78
+ | speech_and_music_3wav.wav | 1.000 | 0.000 | MUSIC | βœ… |
79
+ | speech_and_music_4.wav | 1.000 | 0.000 | MUSIC | βœ… |
80
+ | speech_and_music_5.wav | 1.000 | 0.000 | MUSIC | βœ… |
81
+ | speech_and_music_6.wav | 1.000 | 0.000 | MUSIC | βœ… |
82
+ | speech_and_music_7.wav | 1.000 | 0.000 | MUSIC | βœ… |
83
+ | speech_and_music_8.wav | 1.000 | 0.000 | MUSIC | βœ… |
84
+ | speech_and_music_9.wav | 1.000 | 0.000 | MUSIC | βœ… |
85
+
86
+
87
+ ---
88
+
89
+ ## πŸš€ Quick Start
90
+ ```python
91
+ from transformers import pipeline
92
+
93
+ # Load the model
94
+ classifier = pipeline(
95
+ "audio-classification",
96
+ model="AIGenLab/AST-speech-and-music-classifier-82K"
97
+ )
98
+
99
+ # Classify audio
100
+ result = classifier("your_audio.wav")
101
+ print(result)
102
+ ```
103
+
104
+ ---
105
+
106
+ ## πŸ”§ Advanced Usage
107
+ ```python
108
+ from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
109
+ import torch
110
+ import torchaudio
111
+
112
+ # Load model and feature extractor
113
+ model = AutoModelForAudioClassification.from_pretrained(
114
+ "AIGenLab/AST-speech-and-music-classifier-82K"
115
+ )
116
+ feature_extractor = AutoFeatureExtractor.from_pretrained(
117
+ "AIGenLab/AST-speech-and-music-classifier-82K"
118
+ )
119
+
120
+ # Load audio (16kHz required)
121
+ audio, sr = torchaudio.load("audio.wav")
122
+ if sr != 16000:
123
+ audio = torchaudio.functional.resample(audio, sr, 16000)
124
+
125
+ # Process
126
+ inputs = feature_extractor(
127
+ audio.squeeze().numpy(),
128
+ sampling_rate=16000,
129
+ return_tensors="pt"
130
+ )
131
+
132
+ # Predict
133
+ with torch.no_grad():
134
+ outputs = model(**inputs)
135
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
136
+
137
+ music_score = predictions[0][0].item()
138
+ speech_score = predictions[0][1].item()
139
+
140
+ print(f"Music: {music_score:.3f}")
141
+ print(f"Speech: {speech_score:.3f}")
142
+ ```
143
+
144
+ ---
145
+
146
+ ## πŸ“Š Training Details
147
+
148
+ | Parameter | Value |
149
+ |-----------|-------|
150
+ | Base Model | MIT/ast-finetuned-audioset-10-10-0.4593 |
151
+ | Dataset | AIGenLab/speech-music-82k (82000 samples) |
152
+ | Epochs | 1 |
153
+ | Batch Size | 64 |
154
+ | Learning Rate | 3e-5 |
155
+ | Loss Weight | Music: 2.5x, Speech: 1.0x |
156
+ | Optimizer | AdamW |
157
+ | Framework | Transformers + PyTorch |