philipp-zettl's picture
Upload folder using huggingface_hub
d7f377d verified
metadata
language: en
tags:
  - mask-predict
  - diffusion
  - masked-lm
library_name: transformers
base_model: answerdotai/ModernBERT-base
pipeline_tag: fill-mask

modernbert-diffusion-instruct

Model Summary

A diffusion-style masked language model fine-tuned in instruct mode using a discrete denoising objective.

Model Details

  • Model ID: philipp-zettl/modernbert-diffusion-instruct
  • Base model: answerdotai/ModernBERT-base
  • Training mode: instruct
  • Task type: Masked token denoising / diffusion-style infilling

Intended Use

Intended for instruction-following style infilling in chat-like prompts.

Example

from refinebert.diffusion_engine import MaskedDiffusionEngine

engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-instruct")
prompt = "User: What is diffusion?
AI:"
output = engine.generate(prompt, num_new_tokens=30, steps=12, guidance_scale=3.0)
print(output)

Training Data

Datasets are streamed from Hugging Face and mixed by mode.

Dataset Mix

Dataset Percentage Purpose
HuggingFaceH4/ultrachat_200k (train_sft) 100% Instruction chat

Training Procedure

  • Steps: 50000
  • Batch size: 4
  • Sequence length: 256
  • Learning rate: 5e-05
  • CFG dropout probability: 0.1
  • Samples loaded into RAM: 100000

Training Time & Hardware

  • Duration: 2h 34m 9s
  • Hardware: NVIDIA GeForce RTX 2060 x1 (CUDA available)

Metrics (Training)

Metric Value
Training loss (latest) 4.9687
Training loss (mean) 3.7032
Training step 50000 / 50000

Limitations & Considerations

  • The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
  • Data sources may have licensing or content constraints—review source dataset cards before deployment.
  • Performance can vary substantially by mode (instruct) and prompt structure.