File size: 2,010 Bytes
8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 8b5f951 00dfee2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
library_name: transformers
license: apache-2.0
datasets:
- billion-word-benchmark/lm1b
---
## Quick Start Guide
To use this pre-trained model with the HuggingFace APIs, use the following snippet:
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
# See the `UDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model_name = 'kuleshov-group/udlm-lm1b'
model = AutoModelForMaskedLM.from_pretrained(model_name)
```
## Model Details
UDLM stands for **U**niform **D**iffusion **L**anguage **M**odels.
This model was trained using the refined uniform noise discrete diffusion continuous-time ELBO introduced [here](https://arxiv.org/abs/2412.10193).
### Architecture
The model has a context size of 128 tokens. The model has 139M parameters.
The model architecture is based off of the [Diffusion Transformer architecture](https://arxiv.org/abs/2212.09748) and consists of:
- 12 multi-head attention blocks (with 12 attention heads),
- hidden dimension of 768,
- `adaLN` for conditioning on time-step (i.e., during diffusion training / generation).
### Training Details
The model was trained using the `bert-base-uncased` tokenizer.
We trained for 1M gradient update steps using a batch size of 512.
We use linear warm-up with 2500 steps until we reach a constant learning rate of 3e-4.
For more details, please refer to our work: [Simple Guidance Mechanisms for Discrete Diffusion Models](https://arxiv.org/abs/2412.10193).
## Citation
Please cite our work using the bibtex below:
### BibTeX:
```
@article{schiff2024discreteguidance,
title={Simple Guidance Mechanisms for Discrete Diffusion Models},
author={Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-torre, Hugo and de Almeida, Bernardo P and Rush, Alexander and Pierrot, Thomas and Kuleshov, Volodymyr},
journal={arXiv preprint arXiv:2412.10193},
year={2024}
}
```
|