File size: 4,520 Bytes
f7b524d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | ---
license: cc-by-nc-4.0
library_name: woosh
tags:
- audio
- audio-generation
- sound-effects
- foley
- text-to-audio
- video-to-audio
- t2a
- v2a
- safetensors
- woosh
pipeline_tag: text-to-audio
---
# Woosh — Sony AI Sound-Effect Foundation Model (Mirror)
This repository is a **community mirror** of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.
All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience.
## License — CC-BY-NC 4.0 (Non-Commercial)
> All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. **Generated outputs inherit the non-commercial restriction.** You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.
If you need to attribute: **Sony Research — Woosh** ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`).
## Model Suite
Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.
### Shared infrastructure
| Folder | Role | File(s) |
|---|---|---|
| `checkpoints/Woosh-AE/` | Audio encoder / decoder producing high-quality latents | `weights.safetensors`, `config.yaml` |
| `checkpoints/Woosh-CLAP/` | Multimodal text-audio alignment model (audio + text encoders) | `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerA/` | Text conditioner for the T2A path (pairs with Flow / DFlow) | `weights.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerV/` | Text conditioner for the V2A path (pairs with VFlow / DVFlow) | `weights.safetensors`, `config.yaml` |
### Generative backbones
| Folder | Task | Notes |
|---|---|---|
| `checkpoints/Woosh-Flow/` | Text → Audio | Full-quality T2A latent diffusion |
| `checkpoints/Woosh-DFlow/` | Text → Audio | Distilled T2A — fewer steps, faster inference |
| `checkpoints/Woosh-VFlow-8s/` | Video → Audio | V2A latent diffusion — **fixed 8-second output** |
| `checkpoints/Woosh-DVFlow-8s/` | Video → Audio | Distilled V2A — fewer steps, **fixed 8-second output** |
Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror.
## Layout
```
checkpoints/
├── Woosh-AE/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-CLAP/
│ ├── weights_audio.safetensors
│ ├── weights_text.safetensors
│ └── config.yaml
├── TextConditionerA/
│ ├── weights.safetensors
│ └── config.yaml
├── TextConditionerV/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-Flow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-DFlow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-VFlow-8s/
│ ├── weights.safetensors
│ └── config.yaml
└── Woosh-DVFlow-8s/
├── weights.safetensors
└── config.yaml
```
Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.
## Usage
This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory.
```bash
# Clone upstream
git clone https://github.com/SonyResearch/Woosh.git
cd Woosh
# Sony's suggested env setup (uses uv)
uv sync
uv pip install -e .
# Pull weights from this mirror
hf download AEmotionStudio/woosh-models --local-dir ./
```
## Acknowledgements
All credit for the Woosh models belongs to **Sony Research**. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.
- Upstream: <https://github.com/SonyResearch/Woosh>
- Release: <https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0>
- License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
|