| --- |
| datasets: |
| - ShandaAI/Hive |
| language: |
| - en |
| license: apache-2.0 |
| pipeline_tag: audio-to-audio |
| tags: |
| - audio |
| - sound-separation |
| - audiosep |
| --- |
| |
| # AudioSep-hive |
|
|
| **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume. |
|
|
| - **Paper:** [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599) |
| - **Project Page:** https://shandaai.github.io/Hive |
| - **Code Repository:** https://github.com/ShandaAI/Hive |
|
|
| ## Model Details |
|
|
| - **Model Type:** Query-Based Universal Sound Separation |
| - **Language(s):** English (for text queries) |
| - **License:** Apache 2.0 |
| - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures) |
|
|
| ## Uses |
|
|
| The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries). |
|
|
| ## Usage |
|
|
| To use this model, you can use the inference scripts provided in the official GitHub repository. |
|
|
| ### 1. Install dependencies |
|
|
| ```bash |
| git clone https://github.com/ShandaAI/Hive |
| cd Hive |
| pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio |
| ``` |
|
|
| ### 2. Run Inference |
|
|
| The following command will automatically download the configuration and checkpoints from this repository: |
|
|
| ```bash |
| python infer_audiosep.py \ |
| --audio_file /path/to/mixture.wav \ |
| --text "acoustic guitar" \ |
| --output_file /path/to/audiosep_output.wav |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{li2026semantically, |
| title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation}, |
| author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin}, |
| journal={arXiv preprint arXiv:2601.22599}, |
| year={2026} |
| } |
| ``` |