Sound Event Detection β€” Pretrained Models

Pretrained models for Sound Event Detection (SED) used in MobiSys 2026 #198 "Aurchestra".

Models

1. YAMNet (Pretrained Baseline)

  • Source: google/yamnet (TensorFlow) / PyTorch reimplementation
  • Classes: 521 AudioSet classes
  • Usage: Loaded directly from HuggingFace β€” no checkpoint in this repo

2. AST (Pretrained Baseline)

  • Source: MIT/ast-finetuned-audioset-10-10-0.4593
  • Architecture: Audio Spectrogram Transformer
  • Classes: 527 AudioSet classes
  • Usage: Loaded directly from HuggingFace β€” no checkpoint in this repo

3. Fine-tuned AST (sed_ast_snr_ctl_v2_16k)

  • Base model: MIT/ast-finetuned-audioset-10-10-0.4593
  • Fine-tuned on: On-the-fly synthesized binaural audio mixtures (SNR-controlled, 16kHz)
  • Classes: 20 target sound classes
  • Training: AdamW, OneCycleLR with group-wise learning rates (backbone 1e-5, head 1e-3), 80 epochs
  • Checkpoint: sed_ast_snr_ctl_v2_16k/checkpoints/best.pt

File Structure

.
β”œβ”€β”€ README.md
└── sed_ast_snr_ctl_v2_16k/
    β”œβ”€β”€ config.json          # Training configuration
    └── checkpoints/
        └── best.pt          # Fine-tuned model weights (~2GB)

Usage

# Fine-tuned AST
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="ooshyun/sound_event_detection",
    filename="sed_ast_snr_ctl_v2_16k/checkpoints/best.pt",
)

config_path = hf_hub_download(
    repo_id="ooshyun/sound_event_detection",
    filename="sed_ast_snr_ctl_v2_16k/config.json",
)

For training and evaluation code, see ooshyun/sound_event_detection.

Citation

If you use these models, please cite:

MobiSys 2026 #198 "Aurchestra"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support