ShandaAI
/

AudioSep-hive

@@ -1,33 +1,65 @@
 ---
-license: apache-2.0
 language:
 - en
 tags:
 - audio
 - sound-separation
-- audio-to-audio
 - audiosep
-datasets:
-- ShandaAI/Hive
 ---
 # AudioSep-hive
-## Model Description
 **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
-This model is developed by **Shanda AI Research Tokyo** and is introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599).
 ## Model Details
-- **Model Type:** Query-Based Universal Sound Separation
-- **Language(s):** English (for text queries)
-- **License:** Apache 2.0 (Please update if different)
-- **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
-- **Paper:** [arXiv:2601.22599](https://arxiv.org/abs/2601.22599)
-- **Code Repository:** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive)
 ## Uses
-The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).

 ---
+datasets:
+- ShandaAI/Hive
 language:
 - en
+license: apache-2.0
+pipeline_tag: audio-to-audio
 tags:
 - audio
 - sound-separation
 - audiosep
 ---
 # AudioSep-hive
 **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
+- **Paper:** [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599)
+- **Project Page:** https://shandaai.github.io/Hive
+- **Code Repository:** https://github.com/ShandaAI/Hive
 ## Model Details
+- **Model Type:** Query-Based Universal Sound Separation
+- **Language(s):** English (for text queries)
+- **License:** Apache 2.0
+- **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
 ## Uses
+The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
+## Usage
+To use this model, you can use the inference scripts provided in the official GitHub repository.
+### 1. Install dependencies
+```bash
+git clone https://github.com/ShandaAI/Hive
+cd Hive
+pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio
+```
+### 2. Run Inference
+The following command will automatically download the configuration and checkpoints from this repository:
+```bash
+python infer_audiosep.py \
+  --audio_file /path/to/mixture.wav \
+  --text "acoustic guitar" \
+  --output_file /path/to/audiosep_output.wav
+```
+## Citation
+```bibtex
+@article{li2026semantically,
+  title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
+  author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
+  journal={arXiv preprint arXiv:2601.22599},
+  year={2026}
+}
+```