ruixiangma commited on
Commit
256f376
Β·
verified Β·
1 Parent(s): 3785ec6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LongCat-AudioDiT-1B-Diffusers
2
+
3
+ Diffusers format for Meituan's [LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B).
4
+
5
+ ## Model Description
6
+
7
+ A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.
8
+
9
+ ## Directory Structure
10
+
11
+ ```
12
+ β”œβ”€β”€ model_index.json # Diffusers config file
13
+ β”œβ”€β”€ text_encoder/ # Text encoder (UMT5)
14
+ β”œβ”€β”€ tokenizer/ # Tokenizer (T5)
15
+ β”œβ”€β”€ transformer/ # Main DiT model
16
+ └── vae/ # VAE encoder/decoder
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ ```python
22
+ from diffusers import LongCatAudioDiTPipeline
23
+ import torch
24
+
25
+ pipe = LongCatAudioDiTPipeline.from_pretrained(
26
+ "ruixiangma/LongCat-AudioDiT-1B-Diffusers",
27
+ torch_dtype=torch.bfloat16
28
+ )
29
+
30
+ audio = pipe(
31
+ prompt="A cheerful piano melody",
32
+ audio_duration_s=5.0,
33
+ num_inference_steps=50,
34
+ guidance_scale=4.0
35
+ ).audio
36
+ ```
37
+
38
+ ## Original Model
39
+
40
+ - HuggingFace: [meituan-longcat/LongCat-AudioDiT-1B](https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B)