Add model card and metadata
Browse filesThis PR adds a model card with the appropriate metadata, including the pipeline tag and library name. It also adds a link to the Github repository for easier access to the model and its source code.
README.md
CHANGED
|
@@ -1,9 +1,22 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
to random sequence lengths during pretraining.
|
| 8 |
|
| 9 |
**Math reasoning**: please see the *gsm8k_safetensors* folder.
|
|
@@ -12,6 +25,4 @@ to random sequence lengths during pretraining.
|
|
| 12 |
|
| 13 |
**Reverse curse**: please see the *reverse_safetensors* folder
|
| 14 |
|
| 15 |
-
For all models, we provide models in `.pth` and `.safetensors` formats.
|
| 16 |
-
|
| 17 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: text-generation
|
| 3 |
+
library_name: transformers
|
| 4 |
+
---
|
| 5 |
|
| 6 |
+
# Scaling up Masked Diffusion Models on Text
|
| 7 |
+
|
| 8 |
+
This repository contains pretrained models for the paper [Scaling up Masked Diffusion Models on Text](https://hf.co/papers/2410.18514). These models demonstrate the scalability and effectiveness of Masked Diffusion Models (MDMs) for language modeling tasks such as text generation and language understanding.
|
| 9 |
+
|
| 10 |
+
Code: https://github.com/ML-GSAI/SMDM
|
| 11 |
+
|
| 12 |
+
## Pretrained models
|
| 13 |
+
|
| 14 |
+
We provide several pretrained models in `.pth` and `.safetensors` formats.
|
| 15 |
+
|
| 16 |
+
**Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
|
| 17 |
+
For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
|
| 18 |
+
parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
|
| 19 |
+
an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
|
| 20 |
to random sequence lengths during pretraining.
|
| 21 |
|
| 22 |
**Math reasoning**: please see the *gsm8k_safetensors* folder.
|
|
|
|
| 25 |
|
| 26 |
**Reverse curse**: please see the *reverse_safetensors* folder
|
| 27 |
|
| 28 |
+
For all models, we provide models in `.pth` and `.safetensors` formats.
|
|
|
|
|
|