pipeline_tag: text-generation
library_name: transformers
Scaling up Masked Diffusion Models on Text
This repository contains pretrained models for the paper Scaling up Masked Diffusion Models on Text. These models demonstrate the scalability and effectiveness of Masked Diffusion Models (MDMs) for language modeling tasks such as text generation and language understanding.
Code: https://github.com/ML-GSAI/SMDM
Pretrained models
We provide several pretrained models in .pth and .safetensors formats.
Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders.
For instance, the checkpoint mdm-1028M-1600e18.safetensors represents an MDM model with 1,028 million non-embedding
parameters and 1,600e18 training FLOPs. Similarly, the checkpoint mdm-170M-100e18-rsl-0.01.safetensors indicates
an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
to random sequence lengths during pretraining.
Math reasoning: please see the gsm8k_safetensors folder.
Conditional generation: please see the sharegpt_safetensors folder.
Reverse curse: please see the reverse_safetensors folder
For all models, we provide models in .pth and .safetensors formats.