philipp-zettl's picture
Upload folder using huggingface_hub
75ba5ae verified
metadata
language: en
tags:
  - mask-predict
  - diffusion
  - masked-lm
library_name: transformers
base_model: philipp-zettl/modernbert-diffusion-universal
pipeline_tag: fill-mask

./refinebert-openwebtext

Model Summary

A diffusion-style masked language model fine-tuned from philipp-zettl/modernbert-diffusion-universal on the Skylion007/openwebtext dataset.

Model Details

  • Model ID: ./refinebert-openwebtext
  • Base model: philipp-zettl/modernbert-diffusion-universal
  • Training mode: Fine-tuning
  • Task type: Masked token denoising / diffusion-style infilling

Intended Use

Intended for tasks related to Skylion007/openwebtext.

Example

from refinebert.diffusion_engine import MaskedDiffusionEngine

engine = MaskedDiffusionEngine("./refinebert-openwebtext")
prompt = "N/A (See generation logs)"
output = engine.generate(prompt, num_new_tokens=N/A, steps=N/A, guidance_scale=N/A)
print(output)

Training Data

Single-dataset fine-tuning.

Dataset Mix

| Skylion007/openwebtext | 100% | Fine-tuning Target |

Fine-tuned specifically on the Skylion007/openwebtext dataset.

Training Procedure

  • Steps: 10000
  • Batch size: 8
  • Sequence length: 512
  • Learning rate: 5e-05
  • CFG dropout probability: N/A
  • Samples loaded into RAM: N/A

Training Time & Hardware

  • Duration: 3h 23m 12s
  • Hardware: NVIDIA GeForce RTX 4070 Laptop GPU x1 (CUDA available)

Metrics (Training)

Metric Value
Training Loss 4.4906
Epochs 1
Global Step 10000

Limitations & Considerations

  • The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
  • Data sources may have licensing or content constraints—review source dataset cards before deployment.
  • Performance can vary substantially by mode (Fine-tuning) and prompt structure.