ModernLogBERT (WCE)

A ModernBERT encoder fine-tuned to classify the severity level of a single log line into one of six levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

This checkpoint is trained with a Weighted Cross-Entropy (WCE) objective. A sibling checkpoint trained with Weighted Generalized Cross-Entropy (a noise-tolerant loss) is at hazemkhaled-94/modernlogbert-gce.

Built with the log-lens project, which also provides the Drain3 preprocessing pipeline these inputs require (see "How to use").

Intended use

  • Intended: triage and observability research โ€” predicting or sanity-checking log severity, and flagging entries whose predicted severity disagrees with the emitted level as candidate anomalies.
  • Out of scope: a sole source of truth for alerting or incident severity. Aggregate accuracy hides brittle behavior on unfamiliar log formats โ€” keep a human in the loop.

How to use

Inputs must be Drain3-masked the same way as in training (variables replaced by placeholders such as <NUM>, <IP>, <UUID>); raw text degrades predictions. The log-lens repo ships a ready-to-use Drain3 preprocessing pipeline that produces exactly this masked form.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo = "hazemkhaled-94/modernlogbert-wce"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = "Connection refused after <<NUM>> retries"  # Drain3-masked input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])

Training data

  • In-distribution (train + held-out eval): a publicly available collection of system log corpora (loghub), preprocessed into a level-balanced, stratified sample.
  • Out-of-distribution (evaluation only): a single private industrial Kubernetes log deployment. Not released; used purely as an OOD generalization probe.

Training procedure

Hyperparameter Value
Backbone ModernBERT-base
Loss Weighted Cross-Entropy
Epochs 8
Per-device batch size 32
Gradient accumulation 4 (effective batch 128)
Learning rate 1e-5 (separate LRs for head vs backbone)
Weight decay 0.01
Warmup ratio 0.1
Max sequence length 512
Best-model metric macro F1

Evaluation

In-distribution (held-out stratified slice)

Metric Value
Accuracy 88.18%
Macro precision 0.7647
Macro recall 0.8271
Macro F1 0.7813
Weighted F1 0.8947

On the curated in-distribution slice this WCE checkpoint is the stronger and better-calibrated model (vs the WGCE sibling).

Out-of-distribution

Evaluated on a private industrial Kubernetes domain โ€” a different log distribution than training. Performance degraded modestly but stayed usable, the expected cost of moving to unfamiliar formats. As always for OOD use, validate on your own log distribution before relying on it.

Limitations and biases

  • OOD generalization โ€” only modest degradation was observed on a single private industrial domain; other distributions are unverified, so validate on your own logs.
  • Confidence โ‰  correctness โ€” treat scores as signals, not guarantees.
  • Preprocessing coupling โ€” inputs must be Drain3-masked exactly as in training (use the log-lens preprocessing pipeline).
Downloads last month
66
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hazemkhaled-94/modernlogbert-wce

Finetuned
(1350)
this model

Evaluation results