Fill-Mask
Transformers
Safetensors
udlm
custom_code
File size: 2,010 Bytes
8b5f951
 
00dfee2
 
 
8b5f951
 
00dfee2
8b5f951
00dfee2
8b5f951
00dfee2
 
8b5f951
00dfee2
 
 
 
 
8b5f951
 
00dfee2
8b5f951
00dfee2
 
8b5f951
00dfee2
8b5f951
00dfee2
8b5f951
00dfee2
 
 
 
8b5f951
 
00dfee2
8b5f951
00dfee2
 
 
8b5f951
00dfee2
8b5f951
00dfee2
 
8b5f951
00dfee2
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
library_name: transformers
license: apache-2.0
datasets:
- billion-word-benchmark/lm1b
---

## Quick Start Guide

To use this pre-trained model with the HuggingFace APIs, use the following snippet:

```python
from transformers import AutoModelForMaskedLM, AutoTokenizer

# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model_name = 'kuleshov-group/udlm-lm1b'
model = AutoModelForMaskedLM.from_pretrained(model_name)
```


## Model Details

UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels.
This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193).

### Architecture

The model has a context size of 128 tokens. The model has 139M parameters.

The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of:
- 12 multi-head attention blocks (with 12 attention heads),
- hidden dimension of 768,
- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation).


### Training Details

The model was trained using the `bert-base-uncased` tokenizer.
We trained for 1M gradient update steps using a batch size of 512.
We use linear warm-up with 2500 steps until we reach a constant learning rate of 3e-4.

For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193).

## Citation
Please cite our work using the bibtex below:

### BibTeX:
```
@article{schiff2024discreteguidance,
  title={Simple Guidance Mechanisms for Discrete Diffusion Models},          
  author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
  journal={arXiv preprint arXiv:2412.10193},
  year={2024}
}
```