Instructions to use ruixiangma/LongCat-AudioDiT-1B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ruixiangma/LongCat-AudioDiT-1B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ruixiangma/LongCat-AudioDiT-1B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
LongCat-AudioDiT-1B-Diffusers
Diffusers format for Meituan's LongCat-AudioDiT-1B.
Model Description
A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.
Usage
import soundfile as sf
from diffusers import LongCatAudioDiTPipeline
import torch
pipeline = LongCatAudioDiTPipeline.from_pretrained(
"ruixiangma/LongCat-AudioDiT-1B-Diffusers",
torch_dtype=torch.bfloat16
)
pipeline = pipeline.to("cuda")
prompt = "A calm ocean wave ambience with soft wind in the background."
audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0]
sf.write("output.wav", audio, pipeline.sample_rate)
License
MIT License — following the upstream license published with meituan-longcat/LongCat-AudioDiT-1B.