|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- video-generation |
|
|
- diffusion |
|
|
- transformer |
|
|
- megatron-lm |
|
|
- megatron-checkpoints |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# MUG-V 10B Training Checkpoints |
|
|
|
|
|
Pre-trained Megatron-format checkpoints for [MUG-V 10B](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) video generation model. |
|
|
|
|
|
## Available Checkpoints |
|
|
|
|
|
### MUG-V-10B-torch_dist (Recommended) |
|
|
|
|
|
**Torch Distributed Checkpoint** - Flexible parallelism support |
|
|
|
|
|
- **Format**: Torch Distributed (`.distcp`) |
|
|
- **Parallelism**: Can be loaded with **any TP/PP configuration** |
|
|
- **Use Case**: Production training, flexible distributed setup |
|
|
|
|
|
```bash |
|
|
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*" |
|
|
``` |
|
|
|
|
|
### MUG-V-10B-TP4-legacy |
|
|
|
|
|
**Torch Format (Legacy)** - Fixed TP=4 |
|
|
|
|
|
- **Format**: Torch format (`mp_rank_XX/model_optim_rng.pt`) |
|
|
- **Parallelism**: Must be loaded with **TP=4** |
|
|
- **Use Case**: Fixed TP setup or conversion to Torch Distributed |
|
|
|
|
|
```bash |
|
|
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-TP4-legacy/*" |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Option 1: Direct Training |
|
|
|
|
|
Use the Torch Distributed checkpoint directly for training: |
|
|
|
|
|
```bash |
|
|
# Download checkpoint |
|
|
huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*" |
|
|
|
|
|
# Download sample data |
|
|
huggingface-cli download MUG-V/MUG-V-Training-Samples --repo-type dataset --local-dir ./sample_dataset |
|
|
|
|
|
# Set environment variables |
|
|
export CHECKPOINT_DIR="./checkpoints/MUG-V-10B-torch_dist/torch_dist" |
|
|
export MODEL_TYPE="mugdit_10b" |
|
|
export DATA_TRAIN="./sample_dataset/train.csv" |
|
|
|
|
|
# Start training (8 GPUs) |
|
|
bash examples/mugv/pretrain_slurm.sh |
|
|
``` |
|
|
|
|
|
### Option 2: Convert to HuggingFace Format |
|
|
|
|
|
Convert Megatron checkpoint to HuggingFace format for inference: |
|
|
|
|
|
```bash |
|
|
python -m examples.mugv.convertor.mugdit_mcore2hf \ |
|
|
--dcp-dir ./checkpoints/MUG-V-10B-torch_dist/torch_dist/iter_0000000 \ |
|
|
--output ./mugdit_10b_hf.pt \ |
|
|
--model-size 10B |
|
|
``` |
|
|
|
|
|
## Checkpoint Formats Comparison |
|
|
|
|
|
| Format | Parallelism | File Structure | Training | Conversion | |
|
|
|--------|-------------|----------------|----------|------------| |
|
|
| **Torch Distributed** | Flexible TP/PP | `*.distcp` files | ✅ Recommended | ✅ To HF | |
|
|
| **Torch (Legacy)** | Fixed TP=4 | `mp_rank_XX/` dirs | ⚠️ TP=4 only | ✅ To Torch Dist / HF | |
|
|
| **HuggingFace** | None (inference) | Single `.pt` file | ❌ Not for training | - | |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- **Parameters**: ~10 billion |
|
|
- **Architecture**: Diffusion Transformer (DiT) |
|
|
- **Hidden Size**: 3456 |
|
|
- **Attention Heads**: 48 |
|
|
- **Layers**: 56 |
|
|
- **Compression**: VideoVAE 8×8×8 |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
- **Training Code**: [MUG-V-Megatron-LM-Training](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) |
|
|
- **Inference Code**: [MUG-V](https://github.com/Shopee-MUG/MUG-V) |
|
|
- **Inference Weights**: [MUG-V-inference](https://huggingface.co/MUG-V/MUG-V-inference) |
|
|
- **Sample Dataset**: [MUG-V-Training-Samples](https://huggingface.co/datasets/MUG-V/MUG-V-Training-Samples) |
|
|
|
|
|
## Documentation |
|
|
|
|
|
- **Training Guide**: [examples/mugv/README.md](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md) |
|
|
- **Checkpoint Conversion**: [Conversion Guide](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md#checkpoint-conversion) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{zhang2025mugv10b, |
|
|
title={MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models}, |
|
|
author={Zhang, Yongshun and Fan, Zhongyi and Zhang, Yonghang and Li, Zhangzikang and Chen, Weifeng and Feng, Zhongwei and Wang, Chaoyue and Hou, Peng and Zeng, Anxiang}, |
|
|
journal={arXiv preprint}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache License 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
**Developed by Shopee Multimodal Understanding and Generation (MUG) Team** |