File size: 943 Bytes
256f376
 
 
 
 
 
 
 
 
 
 
 
ceadc62
256f376
 
 
ceadc62
256f376
 
 
 
ceadc62
256f376
ceadc62
 
256f376
ceadc62
f4c063e
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# LongCat-AudioDiT-1B-Diffusers

Diffusers format for Meituan's [LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B).

## Model Description

A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.


## Usage

```python
import soundfile as sf
from diffusers import LongCatAudioDiTPipeline
import torch

pipeline = LongCatAudioDiTPipeline.from_pretrained(
    "ruixiangma/LongCat-AudioDiT-1B-Diffusers",
    torch_dtype=torch.bfloat16
)

pipeline = pipeline.to("cuda")

prompt = "A calm ocean wave ambience with soft wind in the background."
audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0]

sf.write("output.wav", audio, pipeline.sample_rate)
```

## License
MIT License — following the upstream license published with [meituan-longcat/LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B).