Instructions to use ruixiangma/LongCat-AudioDiT-1B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ruixiangma/LongCat-AudioDiT-1B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ruixiangma/LongCat-AudioDiT-1B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| # LongCat-AudioDiT-1B-Diffusers | |
| Diffusers format for Meituan's [LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B). | |
| ## Model Description | |
| A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis. | |
| ## Usage | |
| ```python | |
| import soundfile as sf | |
| from diffusers import LongCatAudioDiTPipeline | |
| import torch | |
| pipeline = LongCatAudioDiTPipeline.from_pretrained( | |
| "ruixiangma/LongCat-AudioDiT-1B-Diffusers", | |
| torch_dtype=torch.bfloat16 | |
| ) | |
| pipeline = pipeline.to("cuda") | |
| prompt = "A calm ocean wave ambience with soft wind in the background." | |
| audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0] | |
| sf.write("output.wav", audio, pipeline.sample_rate) | |
| ``` | |
| ## License | |
| MIT License — following the upstream license published with [meituan-longcat/LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B). |