Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,44 @@
|
|
| 1 |
---
|
| 2 |
license: bsd-3-clause
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
datasets:
|
| 4 |
- ashraq/esc50
|
| 5 |
-
base_model:
|
| 6 |
-
- MIT/ast-finetuned-audioset-10-10-0.4593
|
| 7 |
pipeline_tag: audio-classification
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: bsd-3-clause
|
| 3 |
+
tags:
|
| 4 |
+
- audio-classification
|
| 5 |
+
- audio
|
| 6 |
+
- environmental-sound
|
| 7 |
datasets:
|
| 8 |
- ashraq/esc50
|
|
|
|
|
|
|
| 9 |
pipeline_tag: audio-classification
|
| 10 |
+
base_model: MIT/ast-finetuned-audioset-10-10-0.4593
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# AST Fine-tuned on ESC-50
|
| 14 |
+
|
| 15 |
+
An Audio Spectrogram Transformer (AST) model fine-tuned on the ESC-50 dataset for environmental sound classification.
|
| 16 |
+
|
| 17 |
+
## Model Description
|
| 18 |
+
|
| 19 |
+
This model is based on the [Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) architecture, fine-tuned to classify 50 categories of environmental sounds. The AST applies a pure attention mechanism to audio spectrograms, treating them as sequences of patches similar to Vision Transformers (ViT).
|
| 20 |
+
|
| 21 |
+
## Training
|
| 22 |
+
|
| 23 |
+
- **Base Model**: MIT/ast-finetuned-audioset-10-10-0.4593
|
| 24 |
+
- **Dataset**: [ESC-50](https://github.com/karolpiczak/ESC-50) (Environmental Sound Classification)
|
| 25 |
+
|
| 26 |
+
## Labels
|
| 27 |
+
|
| 28 |
+
The model classifies audio into 50 environmental sound categories:
|
| 29 |
+
|
| 30 |
+
**Animals**: cat, chirping_birds, cow, crow, dog, frog, hen, insects, pig, rooster, sheep
|
| 31 |
+
|
| 32 |
+
**Natural Sounds**: crackling_fire, crickets, rain, sea_waves, thunderstorm, water_drops, wind
|
| 33 |
+
|
| 34 |
+
**Human Sounds**: breathing, brushing_teeth, clapping, coughing, crying_baby, drinking_sipping, footsteps, laughing, sneezing, snoring
|
| 35 |
+
|
| 36 |
+
**Domestic Sounds**: clock_alarm, clock_tick, door_wood_creaks, door_wood_knock, glass_breaking, keyboard_typing, mouse_click, toilet_flush, vacuum_cleaner, washing_machine
|
| 37 |
+
|
| 38 |
+
**Urban Sounds**: airplane, car_horn, church_bells, engine, fireworks, helicopter, siren, train
|
| 39 |
+
|
| 40 |
+
**Mechanical/Tools**: can_opening, chainsaw, hand_saw, pouring_water
|
| 41 |
+
|
| 42 |
+
## License
|
| 43 |
+
|
| 44 |
+
BSD-3-Clause
|