nielsr HF Staff commited on
Commit
b2ad499
·
verified ·
1 Parent(s): bcda56b

Add model card and metadata

Browse files

This PR adds a model card with the appropriate metadata, including the pipeline tag and library name. It also adds a link to the Github repository for easier access to the model and its source code.

Files changed (1) hide show
  1. README.md +19 -8
README.md CHANGED
@@ -1,9 +1,22 @@
1
- ## Pretrained models for the paper *Scaling up Masked Diffusion Models on Text*
 
 
 
2
 
3
- **Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
4
- For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
5
- parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
6
- an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
 
 
 
 
 
 
 
 
 
 
7
  to random sequence lengths during pretraining.
8
 
9
  **Math reasoning**: please see the *gsm8k_safetensors* folder.
@@ -12,6 +25,4 @@ to random sequence lengths during pretraining.
12
 
13
  **Reverse curse**: please see the *reverse_safetensors* folder
14
 
15
- For all models, we provide models in `.pth` and `.safetensors` formats.
16
-
17
-
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ library_name: transformers
4
+ ---
5
 
6
+ # Scaling up Masked Diffusion Models on Text
7
+
8
+ This repository contains pretrained models for the paper [Scaling up Masked Diffusion Models on Text](https://hf.co/papers/2410.18514). These models demonstrate the scalability and effectiveness of Masked Diffusion Models (MDMs) for language modeling tasks such as text generation and language understanding.
9
+
10
+ Code: https://github.com/ML-GSAI/SMDM
11
+
12
+ ## Pretrained models
13
+
14
+ We provide several pretrained models in `.pth` and `.safetensors` formats.
15
+
16
+ **Scaling law experiments**: We provided all pre-trained models in the *ar_safetensors* and *mdm_safetensors* folders.
17
+ For instance, the checkpoint `mdm-1028M-1600e18.safetensors` represents an MDM model with 1,028 million non-embedding
18
+ parameters and 1,600e18 training FLOPs. Similarly, the checkpoint `mdm-170M-100e18-rsl-0.01.safetensors` indicates
19
+ an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
20
  to random sequence lengths during pretraining.
21
 
22
  **Math reasoning**: please see the *gsm8k_safetensors* folder.
 
25
 
26
  **Reverse curse**: please see the *reverse_safetensors* folder
27
 
28
+ For all models, we provide models in `.pth` and `.safetensors` formats.