dvruette
/

gidd-mask-3b

Text Generation

Model card Files Files and versions

dvruette commited on Dec 16, 2025

Commit

63f1df5

·

verified ·

1 Parent(s): f96e0d6

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -26,7 +26,6 @@ This repository contains the model checkpoints from the paper "Scaling Behavior
 In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
 To confirm these findings, we train scaled-up models to compute optimality.
 Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
-Below we plot the compute-bound and token-bound scaling laws for all investigated noise types with the scaled-up runs (3B and 10B) overlayed as circles.
 | Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
 |:------|-----:|-----------:|:---------------|:-----------------|

 In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
 To confirm these findings, we train scaled-up models to compute optimality.
 Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
 | Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
 |:------|-----:|-----------:|:---------------|:-----------------|