--- datasets: - ShandaAI/Hive language: - en license: apache-2.0 pipeline_tag: audio-to-audio tags: - audio - sound-separation - flowsep --- # FlowSep-hive ## Model Description **FlowSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume. This model was introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://huggingface.co/papers/2601.22599). - **Developed by:** Shanda AI Research Tokyo - **Paper:** [Hugging Face Papers](https://huggingface.co/papers/2601.22599) - **Code Repository:** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive) - **Project Page:** [https://shandaai.github.io/Hive](https://shandaai.github.io/Hive) ## Model Details - **Model Type:** Query-Based Universal Sound Separation - **Language(s):** English (for text queries) - **License:** Apache 2.0 - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures) ## Uses The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries). ## Usage You can perform inference using the scripts provided in the official GitHub repository. ### 1) Install dependencies ```bash pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub ``` ### 2) FlowSep inference Clone the [repository](https://github.com/ShandaAI/Hive) and use the `infer_flowsep.py` script, which automatically downloads the configuration and checkpoints: ```bash python infer_flowsep.py \ --audio_file /path/to/mixture.wav \ --text "acoustic guitar" \ --output_file /path/to/flowsep_output.wav ``` ## Citation If you find this model or the Hive dataset useful, please cite: ```bibtex @article{li2026semantically, title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation}, author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin}, journal={arXiv preprint arXiv:2601.22599}, year={2026} } ```