FlowSep-hive / README.md

nielsr HF Staff

Improve model card: add pipeline tag, links, and usage

635c9ca verified 29 days ago

2.54 kB

datasets:
  - ShandaAI/Hive
language:
  - en
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
  - audio
  - sound-separation
  - flowsep

FlowSep-hive

Model Description

FlowSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.

This model was introduced in the paper: A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation.

Developed by: Shanda AI Research Tokyo
Paper: Hugging Face Papers
Code Repository: GitHub - ShandaAI/Hive
Project Page: https://shandaai.github.io/Hive

Model Details

Model Type: Query-Based Universal Sound Separation
Language(s): English (for text queries)
License: Apache 2.0
Trained on: ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)

Uses

The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).

Usage

You can perform inference using the scripts provided in the official GitHub repository.

1) Install dependencies

pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub

2) FlowSep inference

Clone the repository and use the infer_flowsep.py script, which automatically downloads the configuration and checkpoints:

python infer_flowsep.py \
  --audio_file /path/to/mixture.wav \
  --text "acoustic guitar" \
  --output_file /path/to/flowsep_output.wav

Citation

If you find this model or the Hive dataset useful, please cite:

@article{li2026semantically,
  title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
  author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
  journal={arXiv preprint arXiv:2601.22599},
  year={2026}
}