Audio-to-Audio
English
audio
sound-separation
audiosep

Improve model card with links and usage instructions

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +46 -14
README.md CHANGED
@@ -1,33 +1,65 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
  - en
 
 
5
  tags:
6
  - audio
7
  - sound-separation
8
- - audio-to-audio
9
  - audiosep
10
- datasets:
11
- - ShandaAI/Hive
12
  ---
13
 
14
  # AudioSep-hive
15
 
16
- ## Model Description
17
-
18
  **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
19
 
20
- This model is developed by **Shanda AI Research Tokyo** and is introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599).
 
 
21
 
22
  ## Model Details
23
 
24
- - **Model Type:** Query-Based Universal Sound Separation
25
- - **Language(s):** English (for text queries)
26
- - **License:** Apache 2.0 (Please update if different)
27
- - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
28
- - **Paper:​** [arXiv:2601.22599](https://arxiv.org/abs/2601.22599)
29
- - **Code Repository:​** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive)
30
 
31
  ## Uses
32
 
33
- The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - ShandaAI/Hive
4
  language:
5
  - en
6
+ license: apache-2.0
7
+ pipeline_tag: audio-to-audio
8
  tags:
9
  - audio
10
  - sound-separation
 
11
  - audiosep
 
 
12
  ---
13
 
14
  # AudioSep-hive
15
 
 
 
16
  **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
17
 
18
+ - **Paper:** [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599)
19
+ - **Project Page:** https://shandaai.github.io/Hive
20
+ - **Code Repository:** https://github.com/ShandaAI/Hive
21
 
22
  ## Model Details
23
 
24
+ - **Model Type:** Query-Based Universal Sound Separation
25
+ - **Language(s):** English (for text queries)
26
+ - **License:** Apache 2.0
27
+ - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
 
 
28
 
29
  ## Uses
30
 
31
+ The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
32
+
33
+ ## Usage
34
+
35
+ To use this model, you can use the inference scripts provided in the official GitHub repository.
36
+
37
+ ### 1. Install dependencies
38
+
39
+ ```bash
40
+ git clone https://github.com/ShandaAI/Hive
41
+ cd Hive
42
+ pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub gradio
43
+ ```
44
+
45
+ ### 2. Run Inference
46
+
47
+ The following command will automatically download the configuration and checkpoints from this repository:
48
+
49
+ ```bash
50
+ python infer_audiosep.py \
51
+ --audio_file /path/to/mixture.wav \
52
+ --text "acoustic guitar" \
53
+ --output_file /path/to/audiosep_output.wav
54
+ ```
55
+
56
+ ## Citation
57
+
58
+ ```bibtex
59
+ @article{li2026semantically,
60
+ title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
61
+ author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
62
+ journal={arXiv preprint arXiv:2601.22599},
63
+ year={2026}
64
+ }
65
+ ```