|
|
--- |
|
|
license: bsd-3-clause |
|
|
tags: |
|
|
- audio-classification |
|
|
- audio |
|
|
- environmental-sound |
|
|
datasets: |
|
|
- ashraq/esc50 |
|
|
pipeline_tag: audio-classification |
|
|
base_model: MIT/ast-finetuned-audioset-10-10-0.4593 |
|
|
--- |
|
|
|
|
|
# AST Fine-tuned on ESC-50 |
|
|
|
|
|
An Audio Spectrogram Transformer (AST) model fine-tuned on the ESC-50 dataset for environmental sound classification. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is based on the [Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) architecture, fine-tuned to classify 50 categories of environmental sounds. The AST applies a pure attention mechanism to audio spectrograms, treating them as sequences of patches similar to Vision Transformers (ViT). |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Base Model**: MIT/ast-finetuned-audioset-10-10-0.4593 |
|
|
- **Dataset**: [ESC-50](https://github.com/karolpiczak/ESC-50) (Environmental Sound Classification) |
|
|
|
|
|
## Labels |
|
|
|
|
|
The model classifies audio into 50 environmental sound categories: |
|
|
|
|
|
**Animals**: cat, chirping_birds, cow, crow, dog, frog, hen, insects, pig, rooster, sheep |
|
|
|
|
|
**Natural Sounds**: crackling_fire, crickets, rain, sea_waves, thunderstorm, water_drops, wind |
|
|
|
|
|
**Human Sounds**: breathing, brushing_teeth, clapping, coughing, crying_baby, drinking_sipping, footsteps, laughing, sneezing, snoring |
|
|
|
|
|
**Domestic Sounds**: clock_alarm, clock_tick, door_wood_creaks, door_wood_knock, glass_breaking, keyboard_typing, mouse_click, toilet_flush, vacuum_cleaner, washing_machine |
|
|
|
|
|
**Urban Sounds**: airplane, car_horn, church_bells, engine, fireworks, helicopter, siren, train |
|
|
|
|
|
**Mechanical/Tools**: can_opening, chainsaw, hand_saw, pouring_water |
|
|
|
|
|
## License |
|
|
|
|
|
BSD-3-Clause |