LanguaMan commited on
Commit
0c6b5b7
·
verified ·
1 Parent(s): 403a45b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-to-audio
6
+ tags:
7
+ - t2a
8
+ - v2a
9
+ - text-to-audio
10
+ - video-to-audio
11
+ - woosh
12
+ - comfyui
13
+ - diffusion
14
+ - audio
15
+ - flow-matching
16
+ ---
17
+
18
+ # Woosh — Sound Effect Generative Models
19
+
20
+ Inference code and open weights for sound effect generative models developed at Sony AI.
21
+
22
+ [![GitHub](https://img.shields.io/badge/GitHub-SonyResearch%2FWoosh-black)](https://github.com/SonyResearch/Woosh)
23
+ [![ComfyUI
24
+ Node](https://img.shields.io/badge/ComfyUI-ComfyUI--Woosh-blue)](https://github.com/Saganaki22/ComfyUI-Woosh)
25
+ [![arXiv](https://img.shields.io/badge/arXiv-2502.07359-b31b1b)](https://arxiv.org/abs/2502.07359)
26
+
27
+
28
+
29
+ ![Screenshot 2026-04-12 013347](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/kafOo1f9eZYfyyHgcbzPj.png)
30
+
31
+
32
+ <video controls width="100%">
33
+ <source src="https://huggingface.co/drbaph/Woosh/resolve/main/ComfyUI-Woosh-example.mp4" type="video/mp4">
34
+ Your browser does not support the video tag.
35
+ </video>
36
+
37
+ ## Models
38
+
39
+ | Model | Task | Steps | CFG | Description |
40
+ |-------|------|-------|-----|-------------|
41
+ | **Woosh-Flow** | Text-to-Audio | 50 | 4.5 | Base model, best quality |
42
+ | **Woosh-DFlow** | Text-to-Audio | 4 | 1.0 | Distilled Flow, fast generation |
43
+ | **Woosh-VFlow** | Video-to-Audio | 50 | 4.5 | Base video-to-audio model |
44
+ | **Woosh-DVFlow** | Video-to-Audio | 4 | 1.0 | Distilled VFlow, fast video-to-audio |
45
+
46
+ ### Components
47
+
48
+ - **Woosh-AE** — High-quality latent encoder/decoder. Provides latents for generative modeling and decodes audio from
49
+ generated latents.
50
+ - **Woosh-CLAP (TextConditionerA/V)** — Multimodal text-audio alignment model. Provides token latents for diffusion
51
+ model conditioning. TextConditionerA for T2A, TextConditionerV for V2A.
52
+ - **Woosh-Flow / Woosh-DFlow** — Original and distilled LDMs for text-to-audio generation.
53
+ - **Woosh-VFlow** — Multimodal LDM generating audio from video with optional text prompts.
54
+
55
+ ## ComfyUI Nodes
56
+
57
+ Use these models in [ComfyUI](https://github.com/comfyanonymous/ComfyUI) with
58
+ [ComfyUI-Woosh](https://github.com/Saganaki22/ComfyUI-Woosh):
59
+
60
+ ```bash
61
+ # Via ComfyUI Manager — search "Woosh" and click Install
62
+ # Or manually:
63
+ cd ComfyUI/custom_nodes
64
+ git clone https://github.com/Saganaki22/ComfyUI-Woosh.git
65
+ pip install -r ComfyUI-Woosh/requirements.txt
66
+ ```
67
+
68
+ Place downloaded model folders in `ComfyUI/models/woosh/`. See the [ComfyUI-Woosh
69
+ README](https://github.com/Saganaki22/ComfyUI-Woosh) for full setup and workflow examples.
70
+
71
+ > **Note:** Set the Woosh TextConditioning node to **T2A** for Flow/DFlow models and **V2A** for VFlow/DVFlow models.
72
+
73
+ ## Inference
74
+
75
+ See the [official Woosh repository](https://github.com/SonyResearch/Woosh) for standalone inference code and training
76
+ details.
77
+
78
+ ## VRAM Requirements
79
+
80
+ | Model | VRAM (Approx) |
81
+ |-------|---------------|
82
+ | Flow / VFlow | ~8-12 GB |
83
+ | DFlow / DVFlow | ~4-6 GB |
84
+ | With CPU offload | ~2-4 GB |
85
+
86
+ ## Citation
87
+
88
+ ```bibtex
89
+ @article{saghibakshi2025woosh,
90
+ title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
91
+ author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
92
+ Kawakami, Kazuhiro and Gu, Yuxuan},
93
+ journal={arXiv preprint arXiv:2502.07359},
94
+ year={2025}
95
+ }
96
+ ```
97
+
98
+ ## License
99
+
100
+ - **Code** — Apache 2.0
101
+ - **Model Weights** — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)