JusperLee commited on
Commit
e5d9728
·
0 Parent(s):

Duplicate from ShandaAI/AudioSep-hive

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - audio
7
+ - sound-separation
8
+ - audio-to-audio
9
+ - audiosep
10
+ datasets:
11
+ - ShandaAI/Hive
12
+ ---
13
+
14
+ # AudioSep-hive
15
+
16
+ ## Model Description
17
+
18
+ **AudioSep-hive** is a data-efficient, query-based universal sound separation model trained on the [Hive dataset](https://huggingface.co/datasets/ShandaAI/Hive). By leveraging the high-quality, semantically consistent Hive dataset, this model achieves competitive separation accuracy and perceptual quality comparable to state-of-the-art models (such as SAM-Audio) while utilizing only a fraction (~0.2%) of the training data volume.
19
+
20
+ This model is developed by **Shanda AI Research Tokyo** and is introduced in the paper: [A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation](https://arxiv.org/abs/2601.22599).
21
+
22
+ ## Model Details
23
+
24
+ - **Model Type:​** Query-Based Universal Sound Separation
25
+ - **Language(s):​** English (for text queries)
26
+ - **License:​** Apache 2.0 (Please update if different)
27
+ - **Trained on:​** [ShandaAI/Hive](https://huggingface.co/datasets/ShandaAI/Hive) (2,442 hours of raw audio, 19.6M mixtures)
28
+ - **Paper:​** [arXiv:2601.22599](https://arxiv.org/abs/2601.22599)
29
+ - **Code Repository:​** [GitHub - ShandaAI/Hive](https://github.com/ShandaAI/Hive)
30
+
31
+ ## Uses
32
+
33
+ The model is intended for universal sound separation tasks, allowing users to extract specific sounds from complex audio mixtures using multimodal prompts (e.g., text descriptions or audio queries).
audiosep_hive.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a13fff5fa4ece1a8bc13e42e1c7b8d90e21603075302ca89e4339c9471973300
3
+ size 1264846755
config.yaml ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ task_name: AudioSep
3
+
4
+ data:
5
+ datafiles:
6
+ - 'datafiles/template.json'
7
+
8
+ sampling_rate: 32000
9
+ segment_seconds: 5
10
+ loudness_norm:
11
+ lower_db: -10
12
+ higher_db: 10
13
+ max_mix_num: 2
14
+
15
+ model:
16
+ query_net: CLAP
17
+ condition_size: 512
18
+ model_type: ResUNet30
19
+ input_channels: 1
20
+ output_channels: 1
21
+ resume_checkpoint: ""
22
+ use_text_ratio: 1.0
23
+
24
+ train:
25
+ optimizer:
26
+ optimizer_type: AdamW
27
+ learning_rate: 1e-3
28
+ warm_up_steps: 10000
29
+ reduce_lr_steps: 1000000
30
+ lr_lambda_type: constant_warm_up
31
+ num_nodes: 1
32
+ num_workers: 6
33
+ loss_type: l1_wav
34
+ sync_batchnorm: True
35
+ batch_size_per_device: 12
36
+ steps_per_epoch: 10000
37
+ evaluate_step_frequency: 10000
38
+ save_step_frequency: 20000
39
+ early_stop_steps: 10000001
40
+ random_seed: 1234
41
+
42
+
music_speech_audioset_epoch_15_esc_89.98.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51c68f12f9d7ea25fdaaccf741ec7f81e93ee594455410f3bca4f47f88d8e006
3
+ size 2352471003