Sound Event Detection — Pretrained Models

Pretrained models for Sound Event Detection (SED) used in MobiSys 2026 #198 "Fine-grained Soundscape Control for Augmented Hearing".

Models

1. YAMNet (Pretrained Baseline)

Source: google/yamnet (TensorFlow) / PyTorch reimplementation
Classes: 521 AudioSet classes
Usage: Loaded directly from HuggingFace — no checkpoint in this repo

2. AST (Pretrained Baseline)

Source: MIT/ast-finetuned-audioset-10-10-0.4593
Architecture: Audio Spectrogram Transformer
Classes: 527 AudioSet classes
Usage: Loaded directly from HuggingFace — no checkpoint in this repo

3. Fine-tuned AST (`sed_ast_snr_ctl_v2_16k`)

Base model: MIT/ast-finetuned-audioset-10-10-0.4593
Fine-tuned on: On-the-fly synthesized binaural audio mixtures (SNR-controlled, 16kHz)
Classes: 20 target sound classes
Training: AdamW, OneCycleLR with group-wise learning rates (backbone 1e-5, head 1e-3), 80 epochs
Checkpoint: sed_ast_snr_ctl_v2_16k/checkpoints/best.pt

File Structure

.
├── README.md
└── sed_ast_snr_ctl_v2_16k/
    ├── config.json          # Training configuration
    └── checkpoints/
        └── best.pt          # Fine-tuned model weights (~2GB)

Usage

# Fine-tuned AST
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="ooshyun/sound_event_detection",
    filename="sed_ast_snr_ctl_v2_16k/checkpoints/best.pt",
)

config_path = hf_hub_download(
    repo_id="ooshyun/sound_event_detection",
    filename="sed_ast_snr_ctl_v2_16k/config.json",
)

For training and evaluation code, see ooshyun/sound_event_detection.

Citation

If you use these models, please cite:

MobiSys 2026 #198 "Fine-grained Soundscape Control for Augmented Hearing"

Downloads last month: -; Downloads are not tracked for this model. How to track