modernbert-diffusion-universal
Model Summary
A diffusion-style masked language model fine-tuned in code mode using a discrete denoising objective.
Model Details
- Model ID: philipp-zettl/modernbert-diffusion-code
- Base model: answerdotai/ModernBERT-base
- Training mode: universal
- Task type: Masked token denoising / diffusion-style infilling
Intended Use
Base model trained for diffusion-style mlm. Can be used as base for SFT on specialized data sets.
Example
from refinebert.diffusion_engine import MaskedDiffusionEngine
engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-universal")
prompt = "def fibonacci(n):"
output = engine.generate(prompt, num_new_tokens=20, steps=12, guidance_scale=3.0)
print(output)
Training Data
Datasets are streamed from Hugging Face and mixed by mode.
Dataset Mix
| Dataset | Percentage | Purpose |
|---|
Training Procedure
- Steps: 300000
- Batch size: 16
- Sequence length: 256
- Learning rate: 5e-05
- CFG dropout probability: 0.1
- Samples loaded into RAM: 100000
Training Time & Hardware
- Duration:
- Hardware:
Metrics (Training)
| Metric | Value |
|---|---|
| Training loss (latest) | TBD |
| Training loss (mean) | TBD |
| Training step | 300000 / 300000 |
Limitations & Considerations
- The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
- Data sources may have licensing or content constraints—review source dataset cards before deployment.
- Performance can vary substantially by mode (code) and prompt structure.
- Downloads last month
- -
Model tree for philipp-zettl/modernbert-diffusion-universal
Base model
answerdotai/ModernBERT-base