Audio-to-Audio
English
audio
sound-separation
audiosep
AudioSep-hive / README.md
JusperLee's picture
Create README.md
113d2e4 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - audio
  - sound-separation
  - audio-to-audio
  - audiosep
datasets:
  - ShandaAI/Hive

AudioSep-hive

Model Description

AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.

This model is developed by Shanda AI Research Tokyo and is introduced in the paper: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation.

Model Details

  • Model Type:​ Query-Based Universal Sound Separation
  • Language(s):​ English (for text queries)
  • License:​ Apache 2.0 (Please update if different)
  • Trained on:​ ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)
  • Paper:​ arXiv:2601.22599
  • Code Repository:​ GitHub - ShandaAI/Hive

Uses

The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).