Add model card and metadata

This PR adds a model card with the appropriate metadata, including the pipeline tag and library name. It also adds a link to the Github repository for easier access to the model and its source code.

Files changed (1) hide show

README.md +19 -8

README.md CHANGED Viewed

@@ -1,9 +1,22 @@
-## Pretrained models for the paper *Scaling up Masked Diffusion Models on Text*
-**Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
-For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
-parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
-an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
 to random sequence lengths during pretraining.
 **Math reasoning**: please see the *gsm8k_safetensors* folder.
@@ -12,6 +25,4 @@ to random sequence lengths during pretraining.
 **Reverse curse**: please see the *reverse_safetensors* folder
-For all models, we provide models in `.pth` and `.safetensors` formats.

+---
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Scaling up Masked Diffusion Models on Text
+This repository contains pretrained models for the paper [Scaling up Masked Diffusion Models on Text](https://hf.co/papers/2410.18514). These models demonstrate the scalability and effectiveness of Masked Diffusion Models (MDMs) for language modeling tasks such as text generation and language understanding.
+Code: https://github.com/ML-GSAI/SMDM
+## Pretrained models
+We provide several pretrained models in `.pth` and `.safetensors` formats.
+**Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
+For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
+parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
+an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
 to random sequence lengths during pretraining.
 **Math reasoning**: please see the *gsm8k_safetensors* folder.
 **Reverse curse**: please see the *reverse_safetensors* folder
+For all models, we provide models in `.pth` and `.safetensors` formats.