File size: 4,520 Bytes
f7b524d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: cc-by-nc-4.0
library_name: woosh
tags:
  - audio
  - audio-generation
  - sound-effects
  - foley
  - text-to-audio
  - video-to-audio
  - t2a
  - v2a
  - safetensors
  - woosh
pipeline_tag: text-to-audio
---

# Woosh — Sony AI Sound-Effect Foundation Model (Mirror)

This repository is a **community mirror** of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.

All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience.

## License — CC-BY-NC 4.0 (Non-Commercial)

> All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. **Generated outputs inherit the non-commercial restriction.** You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.

If you need to attribute: **Sony Research — Woosh** ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`).

## Model Suite

Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.

### Shared infrastructure

| Folder | Role | File(s) |
|---|---|---|
| `checkpoints/Woosh-AE/` | Audio encoder / decoder producing high-quality latents | `weights.safetensors`, `config.yaml` |
| `checkpoints/Woosh-CLAP/` | Multimodal text-audio alignment model (audio + text encoders) | `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerA/` | Text conditioner for the T2A path (pairs with Flow / DFlow) | `weights.safetensors`, `config.yaml` |
| `checkpoints/TextConditionerV/` | Text conditioner for the V2A path (pairs with VFlow / DVFlow) | `weights.safetensors`, `config.yaml` |

### Generative backbones

| Folder | Task | Notes |
|---|---|---|
| `checkpoints/Woosh-Flow/` | Text → Audio | Full-quality T2A latent diffusion |
| `checkpoints/Woosh-DFlow/` | Text → Audio | Distilled T2A — fewer steps, faster inference |
| `checkpoints/Woosh-VFlow-8s/` | Video → Audio | V2A latent diffusion — **fixed 8-second output** |
| `checkpoints/Woosh-DVFlow-8s/` | Video → Audio | Distilled V2A — fewer steps, **fixed 8-second output** |

Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror.

## Layout

```
checkpoints/
├── Woosh-AE/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-CLAP/
│   ├── weights_audio.safetensors
│   ├── weights_text.safetensors
│   └── config.yaml
├── TextConditionerA/
│   ├── weights.safetensors
│   └── config.yaml
├── TextConditionerV/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-Flow/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-DFlow/
│   ├── weights.safetensors
│   └── config.yaml
├── Woosh-VFlow-8s/
│   ├── weights.safetensors
│   └── config.yaml
└── Woosh-DVFlow-8s/
    ├── weights.safetensors
    └── config.yaml
```

Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.

## Usage

This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory.

```bash
# Clone upstream
git clone https://github.com/SonyResearch/Woosh.git
cd Woosh

# Sony's suggested env setup (uses uv)
uv sync
uv pip install -e .

# Pull weights from this mirror
hf download AEmotionStudio/woosh-models --local-dir ./
```

## Acknowledgements

All credit for the Woosh models belongs to **Sony Research**. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.

- Upstream: <https://github.com/SonyResearch/Woosh>
- Release: <https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0>
- License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)