nielsr HF Staff

Improve model card with links and usage instructions

cdd24f1 verified about 1 month ago

2.24 kB

	---
	datasets:
	- ShandaAI/Hive
	language:
	- en
	license: apache-2.0
	pipeline_tag: audio-to-audio
	tags:
	- audio
	- sound-separation
	- audiosep
	---

	# AudioSep-hive

	AudioSep-hive is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.

	- Paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599)
	- Project Page: https://shandaai.github.io/Hive
	- Code Repository: https://github.com/ShandaAI/Hive

	## Model Details

	- Model Type: Query-Based Universal Sound Separation
	- Language(s): English (for text queries)
	- License: Apache 2.0
	- Trained on: [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)

	## Uses

	The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).

	## Usage

	To use this model, you can use the inference scripts provided in the official GitHub repository.

	### 1. Install dependencies

	```bash
	git clone https://github.com/ShandaAI/Hive
	cd Hive
	pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio
	```

	### 2. Run Inference

	The following command will automatically download the configuration and checkpoints from this repository:

	```bash
	python infer_audiosep.py \
	--audio_file /path/to/mixture.wav \
	--text "acoustic guitar" \
	--output_file /path/to/audiosep_output.wav
	```

	## Citation

	```bibtex
	@article{li2026semantically,
	title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
	author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
	journal={arXiv preprint arXiv:2601.22599},
	year={2026}
	}
	```