File size: 3,857 Bytes
1bd5e47 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
license: apache-2.0
tags:
- video-generation
- diffusion
- transformer
- megatron-lm
- megatron-checkpoints
language:
- en
---
# MUG-V 10B Training Checkpoints
Pre-trained Megatron-format checkpoints for [MUG-V 10B](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) video generation model.
## Available Checkpoints
### MUG-V-10B-torch_dist (Recommended)
**Torch Distributed Checkpoint** - Flexible parallelism support
- **Format**: Torch Distributed (`.distcp`)
- **Parallelism**: Can be loaded with **any TP/PP configuration**
- **Use Case**: Production training, flexible distributed setup
```bash
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*"
```
### MUG-V-10B-TP4-legacy
**Torch Format (Legacy)** - Fixed TP=4
- **Format**: Torch format (`mp_rank_XX/model_optim_rng.pt`)
- **Parallelism**: Must be loaded with **TP=4**
- **Use Case**: Fixed TP setup or conversion to Torch Distributed
```bash
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-TP4-legacy/*"
```
## Quick Start
### Option 1: Direct Training
Use the Torch Distributed checkpoint directly for training:
```bash
# Download checkpoint
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*"
# Download sample data
huggingface-cli download MUG-V/MUG-V-Training-Samples --repo-type dataset --local-dir ./sample_dataset
# Set environment variables
export CHECKPOINT_DIR="./checkpoints/MUG-V-10B-torch_dist/torch_dist"
export MODEL_TYPE="mugdit_10b"
export DATA_TRAIN="./sample_dataset/train.csv"
# Start training (8 GPUs)
bash examples/mugv/pretrain_slurm.sh
```
### Option 2: Convert to HuggingFace Format
Convert Megatron checkpoint to HuggingFace format for inference:
```bash
python -m examples.mugv.convertor.mugdit_mcore2hf \
--dcp-dir ./checkpoints/MUG-V-10B-torch_dist/torch_dist/iter_0000000 \
--output ./mugdit_10b_hf.pt \
--model-size 10B
```
## Checkpoint Formats Comparison
| Format | Parallelism | File Structure | Training | Conversion |
|--------|-------------|----------------|----------|------------|
| **Torch Distributed** | Flexible TP/PP | `*.distcp` files | ✅ Recommended | ✅ To HF |
| **Torch (Legacy)** | Fixed TP=4 | `mp_rank_XX/` dirs | ⚠️ TP=4 only | ✅ To Torch Dist / HF |
| **HuggingFace** | None (inference) | Single `.pt` file | ❌ Not for training | - |
## Model Architecture
- **Parameters**: ~10 billion
- **Architecture**: Diffusion Transformer (DiT)
- **Hidden Size**: 3456
- **Attention Heads**: 48
- **Layers**: 56
- **Compression**: VideoVAE 8×8×8
## Related Resources
- **Training Code**: [MUG-V-Megatron-LM-Training](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training)
- **Inference Code**: [MUG-V](https://github.com/Shopee-MUG/MUG-V)
- **Inference Weights**: [MUG-V-inference](https://huggingface.co/MUG-V/MUG-V-inference)
- **Sample Dataset**: [MUG-V-Training-Samples](https://huggingface.co/datasets/MUG-V/MUG-V-Training-Samples)
## Documentation
- **Training Guide**: [examples/mugv/README.md](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md)
- **Checkpoint Conversion**: [Conversion Guide](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md#checkpoint-conversion)
## Citation
```bibtex
@article{zhang2025mugv10b,
title={MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models},
author={Zhang, Yongshun and Fan, Zhongyi and Zhang, Yonghang and Li, Zhangzikang and Chen, Weifeng and Feng, Zhongwei and Wang, Chaoyue and Hou, Peng and Zeng, Anxiang},
journal={arXiv preprint},
year={2025}
}
```
## License
Apache License 2.0
---
**Developed by Shopee Multimodal Understanding and Generation (MUG) Team** |