woosh-models / README.md
AEmotionStudio's picture
Add files using upload-large-folder tool
f7b524d verified
---
license: cc-by-nc-4.0
library_name: woosh
tags:
- audio
- audio-generation
- sound-effects
- foley
- text-to-audio
- video-to-audio
- t2a
- v2a
- safetensors
- woosh
pipeline_tag: text-to-audio
---
# Woosh — Sony AI Sound-Effect Foundation Model (Mirror)
This repository is a **community mirror** of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.
All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience.
## License — CC-BY-NC 4.0 (Non-Commercial)
> All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. **Generated outputs inherit the non-commercial restriction.** You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.
If you need to attribute: **Sony Research — Woosh** ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`).
## Model Suite
Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.
### Shared infrastructure
| Folder | Role | File(s) |
|---|---|---|
| `checkpoints/Woosh-AE/` | Audio encoder / decoder producing high-quality latents | `weights.safetensors`, `config.yaml` |
| `checkpoints/Woosh-CLAP/` | Multimodal text-audio alignment model (audio + text encoders) | `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerA/` | Text conditioner for the T2A path (pairs with Flow / DFlow) | `weights.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerV/` | Text conditioner for the V2A path (pairs with VFlow / DVFlow) | `weights.safetensors`, `config.yaml` |
### Generative backbones
| Folder | Task | Notes |
|---|---|---|
| `checkpoints/Woosh-Flow/` | Text → Audio | Full-quality T2A latent diffusion |
| `checkpoints/Woosh-DFlow/` | Text → Audio | Distilled T2A — fewer steps, faster inference |
| `checkpoints/Woosh-VFlow-8s/` | Video → Audio | V2A latent diffusion — **fixed 8-second output** |
| `checkpoints/Woosh-DVFlow-8s/` | Video → Audio | Distilled V2A — fewer steps, **fixed 8-second output** |
Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror.
## Layout
```
checkpoints/
├── Woosh-AE/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-CLAP/
│ ├── weights_audio.safetensors
│ ├── weights_text.safetensors
│ └── config.yaml
├── TextConditionerA/
│ ├── weights.safetensors
│ └── config.yaml
├── TextConditionerV/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-Flow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-DFlow/
│ ├── weights.safetensors
│ └── config.yaml
├── Woosh-VFlow-8s/
│ ├── weights.safetensors
│ └── config.yaml
└── Woosh-DVFlow-8s/
├── weights.safetensors
└── config.yaml
```
Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.
## Usage
This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory.
```bash
# Clone upstream
git clone https://github.com/SonyResearch/Woosh.git
cd Woosh
# Sony's suggested env setup (uses uv)
uv sync
uv pip install -e .
# Pull weights from this mirror
hf download AEmotionStudio/woosh-models --local-dir ./
```
## Acknowledgements
All credit for the Woosh models belongs to **Sony Research**. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.
- Upstream: <https://github.com/SonyResearch/Woosh>
- Release: <https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0>
- License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)