Text Generation
Transformers
Safetensors
English
gidd
custom_code
dvruette commited on
Commit
63f1df5
·
verified ·
1 Parent(s): f96e0d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -1
README.md CHANGED
@@ -26,7 +26,6 @@ This repository contains the model checkpoints from the paper "Scaling Behavior
26
  In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
27
  To confirm these findings, we train scaled-up models to compute optimality.
28
  Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
29
- Below we plot the compute-bound and token-bound scaling laws for all investigated noise types with the scaled-up runs (3B and 10B) overlayed as circles.
30
 
31
  | Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
32
  |:------|-----:|-----------:|:---------------|:-----------------|
 
26
  In our paper, we investigate the scaling behavior of discrete diffusion language models (DLMs) for different noise types (masking, uniform, and hybrid-noise), finding that all of them scale well in compute-bound settings and especially in token-bound settings, with uniform noise coming out on top for the latter.
27
  To confirm these findings, we train scaled-up models to compute optimality.
28
  Specifically, we train two 3B models (masked and uniform diffusion) as well as a 10B parameter uniform diffusion model, which, to the best of our knowledge, is the largest public uniform diffusion model to date.
 
29
 
30
  | Model | Size | Train. PPL | Diffusion type | HuggingFace Link |
31
  |:------|-----:|-----------:|:---------------|:-----------------|