SMDM / README.md

nielsr HF Staff

Add model card and metadata

b2ad499 verified about 1 year ago

1.34 kB

pipeline_tag: text-generation
library_name: transformers

Scaling up Masked Diffusion Models on Text

This repository contains pretrained models for the paper Scaling up Masked Diffusion Models on Text. These models demonstrate the scalability and effectiveness of Masked Diffusion Models (MDMs) for language modeling tasks such as text generation and language understanding.

Code: https://github.com/ML-GSAI/SMDM

Pretrained models

We provide several pretrained models in .pth and .safetensors formats.

Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders. For instance, the checkpoint mdm-1028M-1600e18.safetensors represents an MDM model with 1,028 million non-embedding parameters and 1,600e18 training FLOPs. Similarly, the checkpoint mdm-170M-100e18-rsl-0.01.safetensors indicates an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected to random sequence lengths during pretraining.

Math reasoning: please see the gsm8k_safetensors folder.

Conditional generation: please see the sharegpt_safetensors folder.

Reverse curse: please see the reverse_safetensors folder

For all models, we provide models in .pth and .safetensors formats.