Audio-to-Audio
English
audio
sound-separation
flowsep

Improve model card: add pipeline tag, links, and usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +49 -12
README.md CHANGED
@@ -1,14 +1,14 @@
1
  ---
2
- license: apache-2.0
 
3
  language:
4
  - en
 
 
5
  tags:
6
  - audio
7
  - sound-separation
8
- - audio-to-audio
9
  - flowsep
10
- datasets:
11
- - ShandaAI/Hive
12
  ---
13
 
14
  # FlowSep-hive
@@ -17,17 +17,54 @@ datasets:
17
 
18
  **FlowSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
19
 
20
- This model is developed by **Shanda AI Research Tokyo** and is introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599).
 
 
 
 
 
21
 
22
  ## Model Details
23
 
24
- - **Model Type:** Query-Based Universal Sound Separation
25
- - **Language(s):** English (for text queries)
26
- - **License:** Apache 2.0 (Please update if different)
27
- - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
28
- - **Paper:​** [arXiv:2601.22599](https://arxiv.org/abs/2601.22599)
29
- - **Code Repository:​** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive)
30
 
31
  ## Uses
32
 
33
- The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - ShandaAI/Hive
4
  language:
5
  - en
6
+ license: apache-2.0
7
+ pipeline_tag: audio-to-audio
8
  tags:
9
  - audio
10
  - sound-separation
 
11
  - flowsep
 
 
12
  ---
13
 
14
  # FlowSep-hive
 
17
 
18
  **FlowSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
19
 
20
+ This model was introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://huggingface.co/papers/2601.22599).
21
+
22
+ - **Developed by:** Shanda AI Research Tokyo
23
+ - **Paper:** [Hugging Face Papers](https://huggingface.co/papers/2601.22599)
24
+ - **Code Repository:** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive)
25
+ - **Project Page:** [https://shandaai.github.io/Hive](https://shandaai.github.io/Hive)
26
 
27
  ## Model Details
28
 
29
+ - **Model Type:** Query-Based Universal Sound Separation
30
+ - **Language(s):** English (for text queries)
31
+ - **License:** Apache 2.0
32
+ - **Trained on:** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
 
 
33
 
34
  ## Uses
35
 
36
+ The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
37
+
38
+ ## Usage
39
+
40
+ You can perform inference using the scripts provided in the official GitHub repository.
41
+
42
+ ### 1) Install dependencies
43
+
44
+ ```bash
45
+ pip install torch torchaudio librosa pyyaml pytorch-lightning huggingface_hub
46
+ ```
47
+
48
+ ### 2) FlowSep inference
49
+
50
+ Clone the [repository](https://github.com/ShandaAI/Hive) and use the `infer_flowsep.py` script, which automatically downloads the configuration and checkpoints:
51
+
52
+ ```bash
53
+ python infer_flowsep.py \
54
+ --audio_file /path/to/mixture.wav \
55
+ --text "acoustic guitar" \
56
+ --output_file /path/to/flowsep_output.wav
57
+ ```
58
+
59
+ ## Citation
60
+
61
+ If you find this model or the Hive dataset useful, please cite:
62
+
63
+ ```bibtex
64
+ @article{li2026semantically,
65
+ title={A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation},
66
+ author={Li, Kai and Cheng, Jintao and Zeng, Chang and Yan, Zijun and Wang, Helin and Su, Zixiong and Zheng, Bo and Hu, Xiaolin},
67
+ journal={arXiv preprint arXiv:2601.22599},
68
+ year={2026}
69
+ }
70
+ ```