woosh-model / README.md

LanguaMan

Upload README.md

0c6b5b7 verified 1 day ago

preview code

raw

history blame contribute delete

3.53 kB

metadata

license: cc-by-nc-4.0
language:
  - en
pipeline_tag: text-to-audio
tags:
  - t2a
  - v2a
  - text-to-audio
  - video-to-audio
  - woosh
  - comfyui
  - diffusion
  - audio
  - flow-matching

Woosh — Sound Effect Generative Models

Inference code and open weights for sound effect generative models developed at Sony AI.

Models

Model	Task	Steps	CFG	Description
Woosh-Flow	Text-to-Audio	50	4.5	Base model, best quality
Woosh-DFlow	Text-to-Audio	4	1.0	Distilled Flow, fast generation
Woosh-VFlow	Video-to-Audio	50	4.5	Base video-to-audio model
Woosh-DVFlow	Video-to-Audio	4	1.0	Distilled VFlow, fast video-to-audio

Components

Woosh-AE — High-quality latent encoder/decoder. Provides latents for generative modeling and decodes audio from generated latents.
Woosh-CLAP (TextConditionerA/V) — Multimodal text-audio alignment model. Provides token latents for diffusion model conditioning. TextConditionerA for T2A, TextConditionerV for V2A.
Woosh-Flow / Woosh-DFlow — Original and distilled LDMs for text-to-audio generation.
Woosh-VFlow — Multimodal LDM generating audio from video with optional text prompts.

ComfyUI Nodes

Use these models in ComfyUI with ComfyUI-Woosh:

# Via ComfyUI Manager — search "Woosh" and click Install
# Or manually:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Woosh.git
pip install -r ComfyUI-Woosh/requirements.txt

Place downloaded model folders in ComfyUI/models/woosh/. See the ComfyUI-Woosh README for full setup and workflow examples.

Note: Set the Woosh TextConditioning node to T2A for Flow/DFlow models and V2A for VFlow/DVFlow models.

Inference

See the official Woosh repository for standalone inference code and training details.

VRAM Requirements

Model	VRAM (Approx)
Flow / VFlow	~8-12 GB
DFlow / DVFlow	~4-6 GB
With CPU offload	~2-4 GB

Citation

@article{saghibakshi2025woosh,
      title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
      author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
Kawakami, Kazuhiro and Gu, Yuxuan},
      journal={arXiv preprint arXiv:2502.07359},
      year={2025}
}

License

Code — Apache 2.0
Model Weights — CC BY-NC 4.0