| --- |
| license: cc-by-nc-4.0 |
| library_name: woosh |
| tags: |
| - audio |
| - audio-generation |
| - sound-effects |
| - foley |
| - text-to-audio |
| - video-to-audio |
| - t2a |
| - v2a |
| - safetensors |
| - woosh |
| pipeline_tag: text-to-audio |
| --- |
| |
| # Woosh — Sony AI Sound-Effect Foundation Model (Mirror) |
|
|
| This repository is a **community mirror** of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis. |
|
|
| All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience. |
|
|
| ## License — CC-BY-NC 4.0 (Non-Commercial) |
|
|
| > All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. **Generated outputs inherit the non-commercial restriction.** You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0. |
|
|
| If you need to attribute: **Sony Research — Woosh** ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`). |
|
|
| ## Model Suite |
|
|
| Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time. |
|
|
| ### Shared infrastructure |
|
|
| | Folder | Role | File(s) | |
| |---|---|---| |
| | `checkpoints/Woosh-AE/` | Audio encoder / decoder producing high-quality latents | `weights.safetensors`, `config.yaml` | |
| | `checkpoints/Woosh-CLAP/` | Multimodal text-audio alignment model (audio + text encoders) | `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` | |
| | `checkpoints/TextConditionerA/` | Text conditioner for the T2A path (pairs with Flow / DFlow) | `weights.safetensors`, `config.yaml` | |
| | `checkpoints/TextConditionerV/` | Text conditioner for the V2A path (pairs with VFlow / DVFlow) | `weights.safetensors`, `config.yaml` | |
|
|
| ### Generative backbones |
|
|
| | Folder | Task | Notes | |
| |---|---|---| |
| | `checkpoints/Woosh-Flow/` | Text → Audio | Full-quality T2A latent diffusion | |
| | `checkpoints/Woosh-DFlow/` | Text → Audio | Distilled T2A — fewer steps, faster inference | |
| | `checkpoints/Woosh-VFlow-8s/` | Video → Audio | V2A latent diffusion — **fixed 8-second output** | |
| | `checkpoints/Woosh-DVFlow-8s/` | Video → Audio | Distilled V2A — fewer steps, **fixed 8-second output** | |
|
|
| Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror. |
|
|
| ## Layout |
|
|
| ``` |
| checkpoints/ |
| ├── Woosh-AE/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| ├── Woosh-CLAP/ |
| │ ├── weights_audio.safetensors |
| │ ├── weights_text.safetensors |
| │ └── config.yaml |
| ├── TextConditionerA/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| ├── TextConditionerV/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| ├── Woosh-Flow/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| ├── Woosh-DFlow/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| ├── Woosh-VFlow-8s/ |
| │ ├── weights.safetensors |
| │ └── config.yaml |
| └── Woosh-DVFlow-8s/ |
| ├── weights.safetensors |
| └── config.yaml |
| ``` |
|
|
| Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification. |
|
|
| ## Usage |
|
|
| This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory. |
|
|
| ```bash |
| # Clone upstream |
| git clone https://github.com/SonyResearch/Woosh.git |
| cd Woosh |
| |
| # Sony's suggested env setup (uses uv) |
| uv sync |
| uv pip install -e . |
| |
| # Pull weights from this mirror |
| hf download AEmotionStudio/woosh-models --local-dir ./ |
| ``` |
|
|
| ## Acknowledgements |
|
|
| All credit for the Woosh models belongs to **Sony Research**. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license. |
|
|
| - Upstream: <https://github.com/SonyResearch/Woosh> |
| - Release: <https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0> |
| - License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) |
|
|