--- license: apache-2.0 tags: - video-generation - diffusion - transformer - megatron-lm - megatron-checkpoints language: - en --- # MUG-V 10B Training Checkpoints Pre-trained Megatron-format checkpoints for [MUG-V 10B](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) video generation model. ## Available Checkpoints ### MUG-V-10B-torch_dist (Recommended) **Torch Distributed Checkpoint** - Flexible parallelism support - **Format**: Torch Distributed (`.distcp`) - **Parallelism**: Can be loaded with **any TP/PP configuration** - **Use Case**: Production training, flexible distributed setup ```bash huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*" ``` ### MUG-V-10B-TP4-legacy **Torch Format (Legacy)** - Fixed TP=4 - **Format**: Torch format (`mp_rank_XX/model_optim_rng.pt`) - **Parallelism**: Must be loaded with **TP=4** - **Use Case**: Fixed TP setup or conversion to Torch Distributed ```bash huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-TP4-legacy/*" ``` ## Quick Start ### Option 1: Direct Training Use the Torch Distributed checkpoint directly for training: ```bash # Download checkpoint huggingface-cli download MUG-V/MUG-V-training --local-dir ./checkpoints --include "MUG-V-10B-torch_dist/*" # Download sample data huggingface-cli download MUG-V/MUG-V-Training-Samples --repo-type dataset --local-dir ./sample_dataset # Set environment variables export CHECKPOINT_DIR="./checkpoints/MUG-V-10B-torch_dist/torch_dist" export MODEL_TYPE="mugdit_10b" export DATA_TRAIN="./sample_dataset/train.csv" # Start training (8 GPUs) bash examples/mugv/pretrain_slurm.sh ``` ### Option 2: Convert to HuggingFace Format Convert Megatron checkpoint to HuggingFace format for inference: ```bash python -m examples.mugv.convertor.mugdit_mcore2hf \ --dcp-dir ./checkpoints/MUG-V-10B-torch_dist/torch_dist/iter_0000000 \ --output ./mugdit_10b_hf.pt \ --model-size 10B ``` ## Checkpoint Formats Comparison | Format | Parallelism | File Structure | Training | Conversion | |--------|-------------|----------------|----------|------------| | **Torch Distributed** | Flexible TP/PP | `*.distcp` files | ✅ Recommended | ✅ To HF | | **Torch (Legacy)** | Fixed TP=4 | `mp_rank_XX/` dirs | ⚠️ TP=4 only | ✅ To Torch Dist / HF | | **HuggingFace** | None (inference) | Single `.pt` file | ❌ Not for training | - | ## Model Architecture - **Parameters**: ~10 billion - **Architecture**: Diffusion Transformer (DiT) - **Hidden Size**: 3456 - **Attention Heads**: 48 - **Layers**: 56 - **Compression**: VideoVAE 8×8×8 ## Related Resources - **Training Code**: [MUG-V-Megatron-LM-Training](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training) - **Inference Code**: [MUG-V](https://github.com/Shopee-MUG/MUG-V) - **Inference Weights**: [MUG-V-inference](https://huggingface.co/MUG-V/MUG-V-inference) - **Sample Dataset**: [MUG-V-Training-Samples](https://huggingface.co/datasets/MUG-V/MUG-V-Training-Samples) ## Documentation - **Training Guide**: [examples/mugv/README.md](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md) - **Checkpoint Conversion**: [Conversion Guide](https://github.com/Shopee-MUG/MUG-V-Megatron-LM-Training/blob/main/examples/mugv/README.md#checkpoint-conversion) ## Citation ```bibtex @article{zhang2025mugv10b, title={MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models}, author={Zhang, Yongshun and Fan, Zhongyi and Zhang, Yonghang and Li, Zhangzikang and Chen, Weifeng and Feng, Zhongwei and Wang, Chaoyue and Hou, Peng and Zeng, Anxiang}, journal={arXiv preprint}, year={2025} } ``` ## License Apache License 2.0 --- **Developed by Shopee Multimodal Understanding and Generation (MUG) Team**