bioamla
/

ast-esc50

Audio Classification

audio-spectrogram-transformer

environmental-sound

Model card Files Files and versions

ast-esc50 / README.md

jmcmeen's picture

Update README.md

9f0ec05 verified 10 days ago

|

history blame contribute delete

1.62 kB

	---
	license: bsd-3-clause
	tags:
	- audio-classification
	- audio
	- environmental-sound
	datasets:
	- ashraq/esc50
	pipeline_tag: audio-classification
	base_model: MIT/ast-finetuned-audioset-10-10-0.4593
	---

	# AST Fine-tuned on ESC-50

	An Audio Spectrogram Transformer (AST) model fine-tuned on the ESC-50 dataset for environmental sound classification.

	## Model Description

	This model is based on the [Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) architecture, fine-tuned to classify 50 categories of environmental sounds. The AST applies a pure attention mechanism to audio spectrograms, treating them as sequences of patches similar to Vision Transformers (ViT).

	## Training

	- Base Model: MIT/ast-finetuned-audioset-10-10-0.4593
	- Dataset: [ESC-50](https://github.com/karolpiczak/ESC-50) (Environmental Sound Classification)

	## Labels

	The model classifies audio into 50 environmental sound categories:

	Animals: cat, chirping_birds, cow, crow, dog, frog, hen, insects, pig, rooster, sheep

	Natural Sounds: crackling_fire, crickets, rain, sea_waves, thunderstorm, water_drops, wind

	Human Sounds: breathing, brushing_teeth, clapping, coughing, crying_baby, drinking_sipping, footsteps, laughing, sneezing, snoring

	Domestic Sounds: clock_alarm, clock_tick, door_wood_creaks, door_wood_knock, glass_breaking, keyboard_typing, mouse_click, toilet_flush, vacuum_cleaner, washing_machine

	Urban Sounds: airplane, car_horn, church_bells, engine, fireworks, helicopter, siren, train

	Mechanical/Tools: can_opening, chainsaw, hand_saw, pouring_water

	## License

	BSD-3-Clause