MUG-V-training / README.md
zyfan's picture
Update README.md
bba1432 verified
---
license: apache-2.0
tags:
- video-generation
- diffusion
- transformer
- megatron-lm
- megatron-checkpoints
language:
- en
---
# MUG-V 10B Training Checkpoints
Pre-trained Megatron-format checkpoints for [MUG-V 10B](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) video generation model.
## Available Checkpoints
### MUG-V-10B-torch_dist (Recommended)
**Torch Distributed Checkpoint** - Flexible parallelism support
- **Format**: Torch Distributed (`.distcp`)
- **Parallelism**: Can be loaded with **any TP/PP configuration**
- **Use Case**: Production training, flexible distributed setup
```bash
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*"
```
### MUG-V-10B-TP4-legacy
**Torch Format (Legacy)** - Fixed TP=4
- **Format**: Torch format (`mp_rank_XX/model_optim_rng.pt`)
- **Parallelism**: Must be loaded with **TP=4**
- **Use Case**: Fixed TP setup or conversion to Torch Distributed
```bash
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-TP4-legacy/*"
```
## Quick Start
### Option 1: Direct Training
Use the Torch Distributed checkpoint directly for training:
```bash
# Download checkpoint
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*"
# Download sample data
huggingface-cli download MUG-V/MUG-V-Training-Samples --repo-type dataset --local-dir ./sample_dataset
# Set environment variables
export CHECKPOINT_DIR="./checkpoints/MUG-V-10B-torch_dist/torch_dist"
export MODEL_TYPE="mugdit_10b"
export DATA_TRAIN="./sample_dataset/train.csv"
# Start training (8 GPUs)
bash examples/mugv/pretrain_slurm.sh
```
### Option 2: Convert to HuggingFace Format
Convert Megatron checkpoint to HuggingFace format for inference:
```bash
python -m examples.mugv.convertor.mugdit_mcore2hf \
--dcp-dir ./checkpoints/MUG-V-10B-torch_dist/torch_dist/iter_0000000 \
--output ./mugdit_10b_hf.pt \
--model-size 10B
```
## Checkpoint Formats Comparison
| Format | Parallelism | File Structure | Training | Conversion |
|--------|-------------|----------------|----------|------------|
| **Torch Distributed** | Flexible TP/PP | `*.distcp` files | ✅ Recommended | ✅ To HF |
| **Torch (Legacy)** | Fixed TP=4 | `mp_rank_XX/` dirs | ⚠️ TP=4 only | ✅ To Torch Dist / HF |
| **HuggingFace** | None (inference) | Single `.pt` file | ❌ Not for training | - |
## Model Architecture
- **Parameters**: ~10 billion
- **Architecture**: Diffusion Transformer (DiT)
- **Hidden Size**: 3456
- **Attention Heads**: 48
- **Layers**: 56
- **Compression**: VideoVAE 8×8×8
## Related Resources
- **Training Code**: [MUG-V-Megatron-LM-Training](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training)
- **Inference Code**: [MUG-V](https://github.com/Shopee-MUG/MUG-V)
- **Inference Weights**: [MUG-V-inference](https://huggingface.co/MUG-V/MUG-V-inference)
- **Sample Dataset**: [MUG-V-Training-Samples](https://huggingface.co/datasets/MUG-V/MUG-V-Training-Samples)
## Documentation
- **Training Guide**: [examples/mugv/README.md](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md)
- **Checkpoint Conversion**: [Conversion Guide](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md#checkpoint-conversion)
## Citation
```bibtex
@article{zhang2025mugv10b,
title={MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models},
author={Zhang, Yongshun and Fan, Zhongyi and Zhang, Yonghang and Li, Zhangzikang and Chen, Weifeng and Feng, Zhongwei and Wang, Chaoyue and Hou, Peng and Zeng, Anxiang},
journal={arXiv preprint},
year={2025}
}
```
## License
Apache License 2.0
---
**Developed by Shopee Multimodal Understanding and Generation (MUG) Team**