| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | tags: |
| | - audio |
| | - sound-separation |
| | - audio-to-audio |
| | - audiosep |
| | datasets: |
| | - ShandaAI/Hive |
| | --- |
| | |
| | # AudioSep-hive |
| |
|
| | ## Model Description |
| |
|
| | **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume. |
| |
|
| | This model is developed by **Shanda AI Research Tokyo** and is introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599). |
| |
|
| | ## Model Details |
| |
|
| | - **Model Type:β** Query-Based Universal Sound Separation |
| | - **Language(s):β** English (for text queries) |
| | - **License:β** Apache 2.0 (Please update if different) |
| | - **Trained on:β** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures) |
| | - **Paper:β** [arXiv:2601.22599](https://arxiv.org/abs/2601.22599) |
| | - **Code Repository:β** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive) |
| |
|
| | ## Uses |
| |
|
| | The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries). |