ruixiangma's picture
Update README.md
f4c063e verified
# LongCat-AudioDiT-1B-Diffusers
Diffusers format for Meituan's [LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B).
## Model Description
A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.
## Usage
```python
import soundfile as sf
from diffusers import LongCatAudioDiTPipeline
import torch
pipeline = LongCatAudioDiTPipeline.from_pretrained(
"ruixiangma/LongCat-AudioDiT-1B-Diffusers",
torch_dtype=torch.bfloat16
)
pipeline = pipeline.to("cuda")
prompt = "A calm ocean wave ambience with soft wind in the background."
audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0]
sf.write("output.wav", audio, pipeline.sample_rate)
```
## License
MIT License — following the upstream license published with [meituan-longcat/LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B).