--- license: cc-by-nc-4.0 library_name: woosh tags: - audio - audio-generation - sound-effects - foley - text-to-audio - video-to-audio - t2a - v2a - safetensors - woosh pipeline_tag: text-to-audio --- # Woosh — Sony AI Sound-Effect Foundation Model (Mirror) This repository is a **community mirror** of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis. All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience. ## License — CC-BY-NC 4.0 (Non-Commercial) > All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. **Generated outputs inherit the non-commercial restriction.** You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0. If you need to attribute: **Sony Research — Woosh** ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`). ## Model Suite Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time. ### Shared infrastructure | Folder | Role | File(s) | |---|---|---| | `checkpoints/Woosh-AE/` | Audio encoder / decoder producing high-quality latents | `weights.safetensors`, `config.yaml` | | `checkpoints/Woosh-CLAP/` | Multimodal text-audio alignment model (audio + text encoders) | `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` | | `checkpoints/TextConditionerA/` | Text conditioner for the T2A path (pairs with Flow / DFlow) | `weights.safetensors`, `config.yaml` | | `checkpoints/TextConditionerV/` | Text conditioner for the V2A path (pairs with VFlow / DVFlow) | `weights.safetensors`, `config.yaml` | ### Generative backbones | Folder | Task | Notes | |---|---|---| | `checkpoints/Woosh-Flow/` | Text → Audio | Full-quality T2A latent diffusion | | `checkpoints/Woosh-DFlow/` | Text → Audio | Distilled T2A — fewer steps, faster inference | | `checkpoints/Woosh-VFlow-8s/` | Video → Audio | V2A latent diffusion — **fixed 8-second output** | | `checkpoints/Woosh-DVFlow-8s/` | Video → Audio | Distilled V2A — fewer steps, **fixed 8-second output** | Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror. ## Layout ``` checkpoints/ ├── Woosh-AE/ │ ├── weights.safetensors │ └── config.yaml ├── Woosh-CLAP/ │ ├── weights_audio.safetensors │ ├── weights_text.safetensors │ └── config.yaml ├── TextConditionerA/ │ ├── weights.safetensors │ └── config.yaml ├── TextConditionerV/ │ ├── weights.safetensors │ └── config.yaml ├── Woosh-Flow/ │ ├── weights.safetensors │ └── config.yaml ├── Woosh-DFlow/ │ ├── weights.safetensors │ └── config.yaml ├── Woosh-VFlow-8s/ │ ├── weights.safetensors │ └── config.yaml └── Woosh-DVFlow-8s/ ├── weights.safetensors └── config.yaml ``` Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification. ## Usage This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory. ```bash # Clone upstream git clone https://github.com/SonyResearch/Woosh.git cd Woosh # Sony's suggested env setup (uses uv) uv sync uv pip install -e . # Pull weights from this mirror hf download AEmotionStudio/woosh-models --local-dir ./ ``` ## Acknowledgements All credit for the Woosh models belongs to **Sony Research**. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license. - Upstream: - Release: - License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)