IamCreateAI
/

Ruyi-Mini-7B

RuyiInpaintPipeline

video generation

Model card Files Files and versions

happynear commited on Dec 25, 2024

Commit

fbb8813

·

verified ·

1 Parent(s): 4ef33f9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -46,7 +46,7 @@ Or use ComfyUI wrapper in our [github repo](https://github.com/IamCreateAI/Ruyi-
 ## Model Architecture
 Ruyi-Mini-7B is an advanced image-to-video model with about 7.1 billion parameters. The model architecture is modified from [EasyAnimate V4 model](https://github.com/aigc-apps/EasyAnimate), whose transformer module is inherited from [HunyuanDiT](https://github.com/Tencent/HunyuanDiT). It comprises three key components:
-  1. Casual VAE Module: Handles video compression and decompression. It reduces spatial resolution to 1/8 and temporal resolution to 1/4, with each latent pixel is represented in 16-channel BF16 after compression.
   2. Diffusion Transformer Module: Generates compressed video data using 3D full attention, with:
   - 2D Normalized-RoPE for spatial dimensions;
   - Sin-cos position embedding for temporal dimensions;

 ## Model Architecture
 Ruyi-Mini-7B is an advanced image-to-video model with about 7.1 billion parameters. The model architecture is modified from [EasyAnimate V4 model](https://github.com/aigc-apps/EasyAnimate), whose transformer module is inherited from [HunyuanDiT](https://github.com/Tencent/HunyuanDiT). It comprises three key components:
+  1. Casual VAE Module: Handles video compression and decompression. It reduces spatial resolution to 1/8 and temporal resolution to 1/4, with each latent pixel is represented in 16 float numbers after compression.
   2. Diffusion Transformer Module: Generates compressed video data using 3D full attention, with:
   - 2D Normalized-RoPE for spatial dimensions;
   - Sin-cos position embedding for temporal dimensions;