Instructions to use IamCreateAI/Ruyi-Mini-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use IamCreateAI/Ruyi-Mini-7B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("IamCreateAI/Ruyi-Mini-7B", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -58,7 +58,7 @@ Ruyi-Mini-7B is an advanced image-to-video model with about 7.1 billion paramete
|
|
| 58 |
- Phase 1: Pre-training from scratch with ~200M video clips and ~30M images at a 256-resolution, using a batch size of 4096 for 350,000 iterations to achieve full convergence.
|
| 59 |
- Phase 2: Fine-tuning with ~60M video clips for multi-scale resolutions (384–512), with a batch size of 1024 for 60,000 iterations.
|
| 60 |
- Phase 3: High-quality fine-tuning with ~20M video clips and ~8M images for 384–1024 resolutions, with dynamic batch sizes based on memory and 10,000 iterations.
|
| 61 |
-
- Phase 4:
|
| 62 |
|
| 63 |
## Hardware Requirements
|
| 64 |
The VRAM cost of Ruyi depends on the resolution and duration of the video. Here we list the costs for some typical video size. Tested on single A100.
|
|
|
|
| 58 |
- Phase 1: Pre-training from scratch with ~200M video clips and ~30M images at a 256-resolution, using a batch size of 4096 for 350,000 iterations to achieve full convergence.
|
| 59 |
- Phase 2: Fine-tuning with ~60M video clips for multi-scale resolutions (384–512), with a batch size of 1024 for 60,000 iterations.
|
| 60 |
- Phase 3: High-quality fine-tuning with ~20M video clips and ~8M images for 384–1024 resolutions, with dynamic batch sizes based on memory and 10,000 iterations.
|
| 61 |
+
- Phase 4: Image-to-video training with ~10M curated high-quality video clips, with dynamic batch sizes based on memory for ~10,000 iterations.
|
| 62 |
|
| 63 |
## Hardware Requirements
|
| 64 |
The VRAM cost of Ruyi depends on the resolution and duration of the video. Here we list the costs for some typical video size. Tested on single A100.
|