How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("ruixiangma/LongCat-AudioDiT-1B-Diffusers", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LongCat-AudioDiT-1B-Diffusers

Diffusers format for Meituan's LongCat-AudioDiT-1B.

Model Description

A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.

Usage

import soundfile as sf
from diffusers import LongCatAudioDiTPipeline
import torch

pipeline = LongCatAudioDiTPipeline.from_pretrained(
    "ruixiangma/LongCat-AudioDiT-1B-Diffusers",
    torch_dtype=torch.bfloat16
)

pipeline = pipeline.to("cuda")

prompt = "A calm ocean wave ambience with soft wind in the background."
audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0]

sf.write("output.wav", audio, pipeline.sample_rate)

License

MIT License — following the upstream license published with meituan-longcat/LongCat-AudioDiT-1B.

Downloads last month
69
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support