Sound Event Detection β Pretrained Models
Pretrained models for Sound Event Detection (SED) used in MobiSys 2026 #198 "Aurchestra".
Models
1. YAMNet (Pretrained Baseline)
- Source: google/yamnet (TensorFlow) / PyTorch reimplementation
- Classes: 521 AudioSet classes
- Usage: Loaded directly from HuggingFace β no checkpoint in this repo
2. AST (Pretrained Baseline)
- Source: MIT/ast-finetuned-audioset-10-10-0.4593
- Architecture: Audio Spectrogram Transformer
- Classes: 527 AudioSet classes
- Usage: Loaded directly from HuggingFace β no checkpoint in this repo
3. Fine-tuned AST (sed_ast_snr_ctl_v2_16k)
- Base model:
MIT/ast-finetuned-audioset-10-10-0.4593 - Fine-tuned on: On-the-fly synthesized binaural audio mixtures (SNR-controlled, 16kHz)
- Classes: 20 target sound classes
- Training: AdamW, OneCycleLR with group-wise learning rates (backbone 1e-5, head 1e-3), 80 epochs
- Checkpoint:
sed_ast_snr_ctl_v2_16k/checkpoints/best.pt
File Structure
.
βββ README.md
βββ sed_ast_snr_ctl_v2_16k/
βββ config.json # Training configuration
βββ checkpoints/
βββ best.pt # Fine-tuned model weights (~2GB)
Usage
# Fine-tuned AST
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="ooshyun/sound_event_detection",
filename="sed_ast_snr_ctl_v2_16k/checkpoints/best.pt",
)
config_path = hf_hub_download(
repo_id="ooshyun/sound_event_detection",
filename="sed_ast_snr_ctl_v2_16k/config.json",
)
For training and evaluation code, see ooshyun/sound_event_detection.
Citation
If you use these models, please cite:
MobiSys 2026 #198 "Aurchestra"