Masked Diffusion Language Models with Frequency-Informed Training
Paper • 2509.05056 • Published
How to use despoinakk/diffusion_gaussian_babylm with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="despoinakk/diffusion_gaussian_babylm", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("despoinakk/diffusion_gaussian_babylm", trust_remote_code=True, dtype="auto")🎤 Oral Presentation at BabyLM Workshop @ EMNLP 2025
This model is a Masked Diffusion Language Model (MDLM) trained with a Bimodal Gaussian noise schedule and frequency-informed masking for the BabyLM Challenge 2025.
This model uses a diffusion-based training objective that combines:
Performance on BabyLM Challenge zero-shot tasks:
| Task | Score |
|---|---|
| BLiMP | 78.2 |
| BLiMP Supplement | 73.6 |
| EWoK | 52.5 |
| COMPS | 56.6 |
| Entity Tracking | 39.7 |
from transformers import AutoTokenizer
import torch
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("despoinakk/diffusion_gaussian_babylm")
# Load model (custom modeling code required)
# See: https://github.com/DespoinaKK/babylm-diffusion
If you use this model, please cite:
TBA
Based on work from: