Audio-to-Audio
English
audio
sound-separation
audiosep
AudioSep-hive / README.md
nielsr's picture
nielsr HF Staff
Improve model card with links and usage instructions
cdd24f1 verified
|
Raw
History Blame
2.24 kB
metadata
datasets:
  - ShandaAI/Hive
language:
  - en
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
  - audio
  - sound-separation
  - audiosep

AudioSep-hive

AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the Hive dataset. By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.

Model Details

  • Model Type: Query-Based Universal Sound Separation
  • Language(s): English (for text queries)
  • License: Apache 2.0
  • Trained on: ShandaAI/Hive (2,442 hours of raw audio, 19.6M mixtures)

Uses

The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).

Usage

To use this model, you can use the inference scripts provided in the official GitHub repository.

1. Install dependencies

git clone https://github.com/ShandaAI/Hive
cd Hive
pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio

2. Run Inference

The following command will automatically download the configuration and checkpoints from this repository:

python infer_audiosep.py \
  --audio_file /path/to/mixture.wav \
  --text "acoustic guitar" \
  --output_file /path/to/audiosep_output.wav

Citation

@article{li2026semantically,
  title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
  author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
  journal={arXiv preprint arXiv:2601.22599},
  year={2026}
}