LDM-ModernBERT β€” Pretrained Language Diffusion Model

A language diffusion model built on ModernBERT-base, pretrained on Project Gutenberg using a masked diffusion objective.

This is the base pretrained checkpoint before SFT instruction tuning. For instruction following, see JaydeepR/ldm-modernbert-base-sft.

Inference GIF


Model Details

Property Value
Base model ModernBERT-base
Parameters ~150M
Architecture Masked Language Model (diffusion objective)
Pretrain data Project Gutenberg (6,400,553 train chunks, seq_len=1024)
Pretrain steps 30,000
Effective batch size 128
Learning rate 5e-5 (cosine, 1500 warmup steps)
Hardware RTX 4090 24GB
Training time ~20 hours
Initial train loss 3.887
Initial val loss 3.922
Final train loss 2.917
Final val loss 2.962

Training

The model is pretrained using a flow-matching diffusion objective: at each step, a random fraction t of tokens is masked, and the model learns to predict the original tokens. The loss is scaled by 1/t to account for the difficulty of predicting heavily masked sequences.


Inference

from transformers import AutoModelForMaskedLM
from safetensors.torch import load_file
import torch

model = AutoModelForMaskedLM.from_pretrained("answerdotai/ModernBERT-base")
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict, strict=False)
model.eval()

# Unconditional generation β€” start from all masked tokens
seq_len = 128
input_tokens = torch.full((1, seq_len), tokenizer.mask_token_id, dtype=torch.long)

Or use the provided scripts from the GitHub repo:

# Generate GIF (unconditional)
bash create_gif.sh

Limitations

  • Trained on a relatively small dataset (Project Gutenberg) with limited steps
  • No instruction tuning β€” use the SFT checkpoint for Q&A tasks
  • Output has a literary/formal style reflecting Gutenberg training data

Citation

Built following the approach from:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for JaydeepR/ldm-modernbert-base-pretrain