You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

modernbert-diffusion-instruct

Model Summary

A diffusion-style masked language model fine-tuned in instruct mode using a discrete denoising objective.

Model Details

Model ID: philipp-zettl/modernbert-diffusion-instruct
Base model: answerdotai/ModernBERT-base
Training mode: instruct
Task type: Masked token denoising / diffusion-style infilling

Intended Use

Intended for instruction-following style infilling in chat-like prompts.

Example

from refinebert.diffusion_engine import MaskedDiffusionEngine

engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-instruct")
prompt = "User: What is diffusion?
AI:"
output = engine.generate(prompt, num_new_tokens=30, steps=12, guidance_scale=3.0)
print(output)

Training Data

Datasets are streamed from Hugging Face and mixed by mode.

Dataset Mix

Dataset	Percentage	Purpose
HuggingFaceH4/ultrachat_200k (train_sft)	100%	Instruction chat

Training Procedure

Steps: 50000
Batch size: 4
Sequence length: 256
Learning rate: 5e-05
CFG dropout probability: 0.1
Samples loaded into RAM: 100000

Training Time & Hardware

Duration: 2h 34m 9s
Hardware: NVIDIA GeForce RTX 2060 x1 (CUDA available)

Metrics (Training)

Metric	Value
Training loss (latest)	4.9687
Training loss (mean)	3.7032
Training step	50000 / 50000

Limitations & Considerations

The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
Data sources may have licensing or content constraints—review source dataset cards before deployment.
Performance can vary substantially by mode (instruct) and prompt structure.

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for philipp-zettl/modernbert-diffusion-instruct

Base model

answerdotai/ModernBERT-base

Finetuned

(1170)

this model

Collection including philipp-zettl/modernbert-diffusion-instruct

Diffusion Language Models

Collection

Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/ • 6 items • Updated 11 days ago