metadata
language: en
tags:
- mask-predict
- diffusion
- masked-lm
library_name: transformers
base_model: philipp-zettl/modernbert-diffusion-universal
pipeline_tag: fill-mask
datasets:
- tatsu-lab/alpaca
./refinebert-finetuned
Model Summary
A diffusion-style masked language model fine-tuned from philipp-zettl/modernbert-diffusion-universal on the tatsu-lab/alpaca dataset.
Model Details
- Model ID: refinebert-finetuned
- Base model: philipp-zettl/modernbert-diffusion-universal
- Training mode: Fine-tuning
- Task type: Masked token denoising / diffusion-style infilling
Intended Use
Intended for tasks related to tatsu-lab/alpaca.
Example
from refinebert.diffusion_engine import MaskedDiffusionEngine
engine = MaskedDiffusionEngine("./refinebert-finetuned")
prompt = "N/A (See generation logs)"
output = engine.generate(prompt, num_new_tokens=N/A, steps=N/A, guidance_scale=N/A)
print(output)
Training Data
Single-dataset fine-tuning.
Dataset Mix
| tatsu-lab/alpaca | 100% | Fine-tuning Target |
Fine-tuned specifically on the tatsu-lab/alpaca dataset.
Training Procedure
- Steps: 14630
- Batch size: 8
- Sequence length: 256
- Learning rate: 5e-05
- CFG dropout probability: N/A
- Samples loaded into RAM: N/A
Training Time & Hardware
- Duration: 1h 39m 48s
- Hardware: NVIDIA GeForce RTX 4070 Laptop GPU x1 (CUDA available)
Metrics (Training)
| Metric | Value |
|---|---|
| Training Loss | 2.1540 |
| Epochs | 5 |
| Global Step | 14630 |
Limitations & Considerations
- The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
- Data sources may have licensing or content constraints—review source dataset cards before deployment.
- Performance can vary substantially by mode (Fine-tuning) and prompt structure.