datasets:
- ShandaAI/Hive
language:
- en
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- audio
- sound-separation
- audiosep
AudioSep-hive
AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
- Paper: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation
- Project Page: https://shandaai.github.io/Hive
- Code Repository: https://github.com/ShandaAI/Hive
Model Details
- Model Type: Query-Based Universal Sound Separation
- Language(s): English (for text queries)
- License: Apache 2.0
- Trained on: ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)
Uses
The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
Usage
To use this model, you can use the inference scripts provided in the official GitHub repository.
1. Install dependencies
git clone https://github.com/ShandaAI/Hive
cd Hive
pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio
2. Run Inference
The following command will automatically download the configuration and checkpoints from this repository:
python infer_audiosep.py \
--audio_file /path/to/mixture.wav \
--text "acoustic guitar" \
--output_file /path/to/audiosep_output.wav
Citation
@article{li2026semantically,
title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
journal={arXiv preprint arXiv:2601.22599},
year={2026}
}