Update README with English-only version

4552285 verified 3 months ago

5.6 kB

	---
	library_name: minisora
	license: mit
	language:
	- en
	tags:
	- text-to-video
	- video-diffusion
	- continuation
	- colossalai
	pipeline_tag: text-to-video
	---

	# MiniSora: Fully Open Video Diffusion with ColossalAI

	[GitHub: YN35/minisora](https://github.com/YN35/minisora)
	[Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__)

	---

	## 🧾 Overview

	MiniSora is a fully open video diffusion codebase designed for everything from research to production.

	- All training, inference, and evaluation scripts are available
	- Supports multi-GPU / multi-node training via ColossalAI
	- Simple DiT-based video model + pipeline, inspired by Diffusers
	- Includes a continuation demo to generate the "next" part of an existing video

	This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`.

	---

	## 🚀 Inference: Text-to-Video (Minimal Example)

	```python
	from minisora.models import DiTPipeline

	pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")

	output = pipeline(
	batch_size=1,
	num_inference_steps=28,
	height=64,
	width=64,
	num_frames=20,
	)
	latents = output.latents # shape: (B, C, F, H, W)
	```

	`latents` are video tensors that remain in the same normalized space as training.
	Use the scripts in the repository to decode or visualize them.

	---

	## 🎥 Continuation: Generate the Rest of a Video

	MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.
	A demo script is bundled to extend existing videos.

	```bash
	uv run scripts/demo/full_continuation.py \
	--model-id ramu0e/minisora-dmlab \
	--input-video path/to/input.mp4 \
	--num-extend-frames 12 \
	--num-inference-steps 28 \
	--seed 1234
	```

	See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification.

	---

	## 🧩 Key Features

	- End-to-End Transparency
	- Model definition (DiT): `src/minisora/models/modeling_dit.py`
	- Pipeline: `src/minisora/models/pipeline_dit.py`
	- Training script: `scripts/train.py`
	- Data loaders: `src/minisora/data/`
	Every stage from data to inference is available.

	- ColossalAI for Scale-Out Training
	- Zero / DDP plugins
	- Designed for multi-GPU and multi-node setups
	- Easy experimentation with large video models

	- Simple, Readable Implementation
	- Dependency management via `uv` (`uv sync` / `uv run`)
	- Minimal Diffusers-inspired video DiT pipeline
	- Experiments and analysis scripts organized under `reports/`

	- Continuation / Conditioning Ready
	- Masking logic to fix conditioned frames
	- Training scheme that applies noise to only part of the sequence

	---

	## 🛠 Install & Setup

	### 1. Clone the Repository

	```bash
	git clone https://github.com/YN35/minisora.git
	cd minisora
	```

	### 2. Install Dependencies with `uv`

	```bash
	uv sync
	```

	All scripts can then be executed through `uv run ...`.

	---

	## 📦 This Pipeline (`ramu0e/minisora-dmlab`)

	This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.

	- Model type: DiT-based video diffusion model
	- Training resolution: e.g., 64×64 or 128×128 (see `reports/` in the repo)
	- Frames per sample: typically 20
	- Library: `minisora` (custom lightweight framework)
	- Use case: research or sample-quality video generation

	---

	## 🧪 Training (Summary)

	Complete training code is available in the repository.

	- Main script: `scripts/train.py`
	- Highlights:
	- Rectified-flow style training with `FlowMatchEulerDiscreteScheduler`
	- ColossalAI Booster to switch between Zero / DDP
	- Conditioning-aware objective (noise partial subsets of frames)

	### Example: Single-Node Training

	```bash
	uv run scripts/train.py \
	--dataset_type minecraft \
	--data_root /path/to/train_data \
	--outputs outputs/exp1 \
	--batch_size 32 \
	--precision bf16
	```

	### Example: Multi-Node (torchrun + ColossalAI)

	```bash
	torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
	--dataset_type minecraft \
	--data_root /path/to/train_data \
	--outputs outputs/exp-multinode \
	--batch_size 64 \
	--plugin zero --zero 1
	```

	Refer to `scripts/train.py` for all available options.

	---

	## 📚 Repository Structure (Excerpt)

	- `src/minisora/models/modeling_dit.py` – core DiT transformer for video
	- `src/minisora/models/pipeline_dit.py` – Diffusers-style pipeline (`DiTPipeline`)
	- `src/minisora/data/` – datasets and distributed samplers (DMLab, Minecraft)
	- `scripts/train.py` – ColossalAI-based training loop
	- `scripts/demo/full_vgen.py` – simple end-to-end video generation demo
	- `scripts/demo/full_continuation.py` – continuation demo
	- `reports/` – experiment notes, mask visualizations, metric scripts

	---

	## 🔍 Limitations & Notes

	- This checkpoint targets research-scale experiments.
	- Quality at higher resolution or longer durations depends on data and hyperparameters.
	- Continuation quality varies with the provided prefix and conditioning setup.

	---

	## 🤝 Contributions

	- Contributions to code, models, and docs are welcome.
	- Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora).

	---

	## 📄 License

	- Code and weights are released under the MIT License.
	Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`).

	```text
	MIT License
	Copyright (c) YN
	Permission is hereby granted, free of charge, to any person obtaining a copy
	...
	```