Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,6 @@ This repository contains the model checkpoints from the paper "Scaling Behavior
|
|
| 26 |
In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
|
| 27 |
To confirm these findings, we train scaled-up models to compute optimality.
|
| 28 |
Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
|
| 29 |
-
Below we plot the compute-bound and token-bound scaling laws for all investigated noise types with the scaled-up runs (3B and 10B) overlayed as circles.
|
| 30 |
|
| 31 |
| Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
|
| 32 |
|:------|-----:|-----------:|:---------------|:-----------------|
|
|
|
|
| 26 |
In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
|
| 27 |
To confirm these findings, we train scaled-up models to compute optimality.
|
| 28 |
Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
|
|
|
|
| 29 |
|
| 30 |
| Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
|
| 31 |
|:------|-----:|-----------:|:---------------|:-----------------|
|