| | --- |
| | library_name: minisora |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - text-to-video |
| | - video-diffusion |
| | - continuation |
| | - colossalai |
| | pipeline_tag: text-to-video |
| | --- |
| | |
| | # MiniSora: Fully Open Video Diffusion with ColossalAI |
| |
|
| | [GitHub: YN35/minisora](https://github.com/YN35/minisora) |
| | [Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__) |
| |
|
| | --- |
| |
|
| | ## π§Ύ Overview |
| |
|
| | **MiniSora** is a fully open video diffusion codebase designed for everything from research to production. |
| |
|
| | - All training, inference, and evaluation scripts are available |
| | - Supports multi-GPU / multi-node training via **ColossalAI** |
| | - Simple DiT-based video model + pipeline, inspired by Diffusers |
| | - Includes a continuation demo to generate the "next" part of an existing video |
| |
|
| | This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`. |
| |
|
| | --- |
| |
|
| | ## π Inference: Text-to-Video (Minimal Example) |
| |
|
| | ```python |
| | from minisora.models import DiTPipeline |
| | |
| | pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab") |
| | |
| | output = pipeline( |
| | batch_size=1, |
| | num_inference_steps=28, |
| | height=64, |
| | width=64, |
| | num_frames=20, |
| | ) |
| | latents = output.latents # shape: (B, C, F, H, W) |
| | ``` |
| |
|
| | `latents` are video tensors that remain in the same normalized space as training. |
| | Use the scripts in the repository to decode or visualize them. |
| |
|
| | --- |
| |
|
| | ## π₯ Continuation: Generate the Rest of a Video |
| |
|
| | MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix. |
| | A demo script is bundled to extend existing videos. |
| |
|
| | ```bash |
| | uv run scripts/demo/full_continuation.py \ |
| | --model-id ramu0e/minisora-dmlab \ |
| | --input-video path/to/input.mp4 \ |
| | --num-extend-frames 12 \ |
| | --num-inference-steps 28 \ |
| | --seed 1234 |
| | ``` |
| |
|
| | See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification. |
| |
|
| | --- |
| |
|
| | ## π§© Key Features |
| |
|
| | - **End-to-End Transparency** |
| | - Model definition (DiT): `src/minisora/models/modeling_dit.py` |
| | - Pipeline: `src/minisora/models/pipeline_dit.py` |
| | - Training script: `scripts/train.py` |
| | - Data loaders: `src/minisora/data/` |
| | Every stage from data to inference is available. |
| |
|
| | - **ColossalAI for Scale-Out Training** |
| | - Zero / DDP plugins |
| | - Designed for multi-GPU and multi-node setups |
| | - Easy experimentation with large video models |
| |
|
| | - **Simple, Readable Implementation** |
| | - Dependency management via `uv` (`uv sync` / `uv run`) |
| | - Minimal Diffusers-inspired video DiT pipeline |
| | - Experiments and analysis scripts organized under `reports/` |
| |
|
| | - **Continuation / Conditioning Ready** |
| | - Masking logic to fix conditioned frames |
| | - Training scheme that applies noise to only part of the sequence |
| |
|
| | --- |
| |
|
| | ## π Install & Setup |
| |
|
| | ### 1. Clone the Repository |
| |
|
| | ```bash |
| | git clone https://github.com/YN35/minisora.git |
| | cd minisora |
| | ``` |
| |
|
| | ### 2. Install Dependencies with `uv` |
| |
|
| | ```bash |
| | uv sync |
| | ``` |
| |
|
| | All scripts can then be executed through `uv run ...`. |
| |
|
| | --- |
| |
|
| | ## π¦ This Pipeline (`ramu0e/minisora-dmlab`) |
| |
|
| | This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories. |
| |
|
| | - **Model type**: DiT-based video diffusion model |
| | - **Training resolution**: e.g., 64Γ64 or 128Γ128 (see `reports/` in the repo) |
| | - **Frames per sample**: typically 20 |
| | - **Library**: `minisora` (custom lightweight framework) |
| | - **Use case**: research or sample-quality video generation |
| |
|
| | --- |
| |
|
| | ## π§ͺ Training (Summary) |
| |
|
| | Complete training code is available in the repository. |
| |
|
| | - Main script: `scripts/train.py` |
| | - Highlights: |
| | - Rectified-flow style training with `FlowMatchEulerDiscreteScheduler` |
| | - ColossalAI Booster to switch between Zero / DDP |
| | - Conditioning-aware objective (noise partial subsets of frames) |
| |
|
| | ### Example: Single-Node Training |
| |
|
| | ```bash |
| | uv run scripts/train.py \ |
| | --dataset_type minecraft \ |
| | --data_root /path/to/train_data \ |
| | --outputs outputs/exp1 \ |
| | --batch_size 32 \ |
| | --precision bf16 |
| | ``` |
| |
|
| | ### Example: Multi-Node (torchrun + ColossalAI) |
| |
|
| | ```bash |
| | torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \ |
| | --dataset_type minecraft \ |
| | --data_root /path/to/train_data \ |
| | --outputs outputs/exp-multinode \ |
| | --batch_size 64 \ |
| | --plugin zero --zero 1 |
| | ``` |
| |
|
| | Refer to `scripts/train.py` for all available options. |
| |
|
| | --- |
| |
|
| | ## π Repository Structure (Excerpt) |
| |
|
| | - `src/minisora/models/modeling_dit.py` β core DiT transformer for video |
| | - `src/minisora/models/pipeline_dit.py` β Diffusers-style pipeline (`DiTPipeline`) |
| | - `src/minisora/data/` β datasets and distributed samplers (DMLab, Minecraft) |
| | - `scripts/train.py` β ColossalAI-based training loop |
| | - `scripts/demo/full_vgen.py` β simple end-to-end video generation demo |
| | - `scripts/demo/full_continuation.py` β continuation demo |
| | - `reports/` β experiment notes, mask visualizations, metric scripts |
| |
|
| | --- |
| |
|
| | ## π Limitations & Notes |
| |
|
| | - This checkpoint targets research-scale experiments. |
| | - Quality at higher resolution or longer durations depends on data and hyperparameters. |
| | - Continuation quality varies with the provided prefix and conditioning setup. |
| |
|
| | --- |
| |
|
| | ## π€ Contributions |
| |
|
| | - Contributions to code, models, and docs are welcome. |
| | - Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora). |
| |
|
| | --- |
| |
|
| | ## π License |
| |
|
| | - Code and weights are released under the **MIT License**. |
| | Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`). |
| |
|
| | ```text |
| | MIT License |
| | Copyright (c) YN |
| | Permission is hereby granted, free of charge, to any person obtaining a copy |
| | ... |
| | ``` |
| |
|