minisora-dmlab / README.md
ramu0e's picture
Update README with English-only version
4552285 verified
---
library_name: minisora
license: mit
language:
- en
tags:
- text-to-video
- video-diffusion
- continuation
- colossalai
pipeline_tag: text-to-video
---
# MiniSora: Fully Open Video Diffusion with ColossalAI
[GitHub: YN35/minisora](https://github.com/YN35/minisora)
[Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__)
---
## 🧾 Overview
**MiniSora** is a fully open video diffusion codebase designed for everything from research to production.
- All training, inference, and evaluation scripts are available
- Supports multi-GPU / multi-node training via **ColossalAI**
- Simple DiT-based video model + pipeline, inspired by Diffusers
- Includes a continuation demo to generate the "next" part of an existing video
This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`.
---
## πŸš€ Inference: Text-to-Video (Minimal Example)
```python
from minisora.models import DiTPipeline
pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")
output = pipeline(
batch_size=1,
num_inference_steps=28,
height=64,
width=64,
num_frames=20,
)
latents = output.latents # shape: (B, C, F, H, W)
```
`latents` are video tensors that remain in the same normalized space as training.
Use the scripts in the repository to decode or visualize them.
---
## πŸŽ₯ Continuation: Generate the Rest of a Video
MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.
A demo script is bundled to extend existing videos.
```bash
uv run scripts/demo/full_continuation.py \
--model-id ramu0e/minisora-dmlab \
--input-video path/to/input.mp4 \
--num-extend-frames 12 \
--num-inference-steps 28 \
--seed 1234
```
See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification.
---
## 🧩 Key Features
- **End-to-End Transparency**
- Model definition (DiT): `src/minisora/models/modeling_dit.py`
- Pipeline: `src/minisora/models/pipeline_dit.py`
- Training script: `scripts/train.py`
- Data loaders: `src/minisora/data/`
Every stage from data to inference is available.
- **ColossalAI for Scale-Out Training**
- Zero / DDP plugins
- Designed for multi-GPU and multi-node setups
- Easy experimentation with large video models
- **Simple, Readable Implementation**
- Dependency management via `uv` (`uv sync` / `uv run`)
- Minimal Diffusers-inspired video DiT pipeline
- Experiments and analysis scripts organized under `reports/`
- **Continuation / Conditioning Ready**
- Masking logic to fix conditioned frames
- Training scheme that applies noise to only part of the sequence
---
## πŸ›  Install & Setup
### 1. Clone the Repository
```bash
git clone https://github.com/YN35/minisora.git
cd minisora
```
### 2. Install Dependencies with `uv`
```bash
uv sync
```
All scripts can then be executed through `uv run ...`.
---
## πŸ“¦ This Pipeline (`ramu0e/minisora-dmlab`)
This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.
- **Model type**: DiT-based video diffusion model
- **Training resolution**: e.g., 64Γ—64 or 128Γ—128 (see `reports/` in the repo)
- **Frames per sample**: typically 20
- **Library**: `minisora` (custom lightweight framework)
- **Use case**: research or sample-quality video generation
---
## πŸ§ͺ Training (Summary)
Complete training code is available in the repository.
- Main script: `scripts/train.py`
- Highlights:
- Rectified-flow style training with `FlowMatchEulerDiscreteScheduler`
- ColossalAI Booster to switch between Zero / DDP
- Conditioning-aware objective (noise partial subsets of frames)
### Example: Single-Node Training
```bash
uv run scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp1 \
--batch_size 32 \
--precision bf16
```
### Example: Multi-Node (torchrun + ColossalAI)
```bash
torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp-multinode \
--batch_size 64 \
--plugin zero --zero 1
```
Refer to `scripts/train.py` for all available options.
---
## πŸ“š Repository Structure (Excerpt)
- `src/minisora/models/modeling_dit.py` – core DiT transformer for video
- `src/minisora/models/pipeline_dit.py` – Diffusers-style pipeline (`DiTPipeline`)
- `src/minisora/data/` – datasets and distributed samplers (DMLab, Minecraft)
- `scripts/train.py` – ColossalAI-based training loop
- `scripts/demo/full_vgen.py` – simple end-to-end video generation demo
- `scripts/demo/full_continuation.py` – continuation demo
- `reports/` – experiment notes, mask visualizations, metric scripts
---
## πŸ” Limitations & Notes
- This checkpoint targets research-scale experiments.
- Quality at higher resolution or longer durations depends on data and hyperparameters.
- Continuation quality varies with the provided prefix and conditioning setup.
---
## 🀝 Contributions
- Contributions to code, models, and docs are welcome.
- Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora).
---
## πŸ“„ License
- Code and weights are released under the **MIT License**.
Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`).
```text
MIT License
Copyright (c) YN
Permission is hereby granted, free of charge, to any person obtaining a copy
...
```