Add files using upload-large-folder tool

f7b524d verified 24 days ago

4.52 kB

	---
	license: cc-by-nc-4.0
	library_name: woosh
	tags:
	- audio
	- audio-generation
	- sound-effects
	- foley
	- text-to-audio
	- video-to-audio
	- t2a
	- v2a
	- safetensors
	- woosh
	pipeline_tag: text-to-audio
	---

	# Woosh — Sony AI Sound-Effect Foundation Model (Mirror)

	This repository is a community mirror of the open weights released by Sony Research for [Woosh](https://github.com/SonyResearch/Woosh) — a foundation model for sound-effect generation supporting text-to-audio (T2A) and video-to-audio (V2A) synthesis.

	All files here are a one-to-one copy of Sony's [v1.0.0 GitHub release](https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0), repackaged into a single browseable HF repo for convenience.

	## License — CC-BY-NC 4.0 (Non-Commercial)

	> All open weights in this repository are released by Sony Research under the [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. Generated outputs inherit the non-commercial restriction. You may not use model outputs in commercial products, paid releases, or client work. The upstream project's source code is released separately under MIT / Apache-2.0.

	If you need to attribute: Sony Research — Woosh ([arXiv / paper](https://github.com/SonyResearch/Woosh), GitHub: `SonyResearch/Woosh`).

	## Model Suite

	Woosh is a multi-model suite. All components are required together — the generative backbones depend on the shared AE, CLAP, and text conditioners at inference time.

	### Shared infrastructure

	\| Folder \| Role \| File(s) \|
	\|---\|---\|---\|
	\| `checkpoints/Woosh-AE/` \| Audio encoder / decoder producing high-quality latents \| `weights.safetensors`, `config.yaml` \|
	\| `checkpoints/Woosh-CLAP/` \| Multimodal text-audio alignment model (audio + text encoders) \| `weights_audio.safetensors`, `weights_text.safetensors`, `config.yaml` \|
	\| `checkpoints/TextConditionerA/` \| Text conditioner for the T2A path (pairs with Flow / DFlow) \| `weights.safetensors`, `config.yaml` \|
	\| `checkpoints/TextConditionerV/` \| Text conditioner for the V2A path (pairs with VFlow / DVFlow) \| `weights.safetensors`, `config.yaml` \|

	### Generative backbones

	\| Folder \| Task \| Notes \|
	\|---\|---\|---\|
	\| `checkpoints/Woosh-Flow/` \| Text → Audio \| Full-quality T2A latent diffusion \|
	\| `checkpoints/Woosh-DFlow/` \| Text → Audio \| Distilled T2A — fewer steps, faster inference \|
	\| `checkpoints/Woosh-VFlow-8s/` \| Video → Audio \| V2A latent diffusion — fixed 8-second output \|
	\| `checkpoints/Woosh-DVFlow-8s/` \| Video → Audio \| Distilled V2A — fewer steps, fixed 8-second output \|

	Every weight file ships as `safetensors`. No `.pt` / `.ckpt` / `.bin` in this mirror.

	## Layout

	```
	checkpoints/
	├── Woosh-AE/
	│ ├── weights.safetensors
	│ └── config.yaml
	├── Woosh-CLAP/
	│ ├── weights_audio.safetensors
	│ ├── weights_text.safetensors
	│ └── config.yaml
	├── TextConditionerA/
	│ ├── weights.safetensors
	│ └── config.yaml
	├── TextConditionerV/
	│ ├── weights.safetensors
	│ └── config.yaml
	├── Woosh-Flow/
	│ ├── weights.safetensors
	│ └── config.yaml
	├── Woosh-DFlow/
	│ ├── weights.safetensors
	│ └── config.yaml
	├── Woosh-VFlow-8s/
	│ ├── weights.safetensors
	│ └── config.yaml
	└── Woosh-DVFlow-8s/
	├── weights.safetensors
	└── config.yaml
	```

	Directory names match Sony's release zip layout exactly so the upstream inference code finds its configs without modification.

	## Usage

	This mirror is intended to be consumed by [Sony's upstream `woosh` package](https://github.com/SonyResearch/Woosh). Clone and install the upstream repo, then point it at a local copy of this mirror's `checkpoints/` directory.

	```bash
	# Clone upstream
	git clone https://github.com/SonyResearch/Woosh.git
	cd Woosh

	# Sony's suggested env setup (uses uv)
	uv sync
	uv pip install -e .

	# Pull weights from this mirror
	hf download AEmotionStudio/woosh-models --local-dir ./
	```

	## Acknowledgements

	All credit for the Woosh models belongs to Sony Research. This mirror exists solely to make the CC-BY-NC open weights easier to fetch and integrate. Please cite the upstream project and respect the non-commercial license.

	- Upstream: <https://github.com/SonyResearch/Woosh>
	- Release: <https://github.com/SonyResearch/Woosh/releases/tag/v1.0.0>
	- License: [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)