Replace README with minimal research-style version

f2f1cf6 verified 19 days ago

2.84 kB

	---
	license: other
	license_name: stabilityai-community
	license_link: LICENSE.md
	base_model: stabilityai/stable-audio-open-1.0
	pipeline_tag: text-to-audio
	library_name: stable-audio-tools
	tags:
	- text-to-audio
	- latent-diffusion
	- stereo
	- 44100hz
	---

	# stable-audio-open-1.0

	Generates variable-length (up to 47 s) stereo audio at 44.1 kHz from text prompts. Latent diffusion architecture with an autoencoder, a T5 text encoder, and a transformer-based diffusion (DiT) operating in the autoencoder's latent space.

	This repository is an unmodified redistribution of [`stabilityai/stable-audio-open-1.0`](https://huggingface.co/stabilityai/stable-audio-open-1.0). Weights, configs, license, and dataset attribution files are preserved verbatim.

	## Files

	- `model.safetensors` (~4.85 GB) — primary weights.
	- `model.ckpt` (~4.85 GB) — same weights in `.ckpt` format for `stable_audio_tools`.
	- `model_config.json`, `model_index.json` — pipeline configs.
	- `LICENSE.md` — Stability AI Community License (verbatim).
	- `fma_dataset_attribution2.csv`, `freesound_dataset_attribution2.csv` — training-data attribution (required by the license).

	## Inference

	```python
	import torch
	import torchaudio
	from einops import rearrange
	from stable_audio_tools import get_pretrained_model
	from stable_audio_tools.inference.generation import generate_diffusion_cond

	device = "cuda" if torch.cuda.is_available() else "cpu"
	model, model_config = get_pretrained_model("cudabenchmarktest/stable-audio-open-1.0")
	sample_rate = model_config["sample_rate"]
	sample_size = model_config["sample_size"]
	model = model.to(device)

	conditioning = [{"prompt": "128 BPM tech house drum loop", "seconds_start": 0, "seconds_total": 30}]
	output = generate_diffusion_cond(
	model, steps=100, cfg_scale=7, conditioning=conditioning,
	sample_size=sample_size, sigma_min=0.3, sigma_max=500,
	sampler_type="dpmpp-3m-sde", device=device,
	)
	output = rearrange(output, "b d n -> d (b n)")
	output = output.to(torch.float32).div(torch.max(torch.abs(output))).clamp(-1, 1).mul(32767).to(torch.int16).cpu()
	torchaudio.save("output.wav", output, sample_rate)
	```

	## License and attribution

	Governed by the Stability AI Community License Agreement (see `LICENSE.md`). Permits research, non-commercial use, and commercial use for organizations or individuals with less than $1M USD in total annual revenue. Above that threshold a separate Stability Enterprise license is required.

	Training-data attribution: see the FMA and Freesound CSV files. Distribution of these attribution files alongside the weights is a license requirement and is preserved here.

	- Original release: Stability AI (`stabilityai/stable-audio-open-1.0`).
	- This redistribution: weights and configs unmodified, LICENSE preserved, README replaced. No additional modifications.