ModernLogBERT (WGCE)

A ModernBERT encoder fine-tuned to classify the severity level of a single log line into one of six levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

This checkpoint is trained with a Weighted Generalized Cross-Entropy (WGCE) objective (q = 0.7) โ€” a noise-tolerant loss designed to be more robust to mislabeled training data. A sibling checkpoint trained with plain Weighted Cross-Entropy is at hazemkhaled-94/modernlogbert-wce.

Built with the log-lens project, which also provides the Drain3 preprocessing pipeline these inputs require (see "How to use").

Intended use

  • Intended: triage and observability research โ€” predicting or sanity-checking log severity, and flagging entries whose predicted severity disagrees with the emitted level as candidate anomalies.
  • Out of scope: a sole source of truth for alerting or incident severity. Aggregate accuracy hides brittle behavior on unfamiliar log formats โ€” keep a human in the loop.

How to use

Inputs must be Drain3-masked the same way as in training (variables replaced by placeholders such as <NUM>, <IP>, <UUID>); raw text degrades predictions. The log-lens repo ships a ready-to-use Drain3 preprocessing pipeline that produces exactly this masked form.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo = "hazemkhaled-94/modernlogbert-gce"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = "Connection refused after <<NUM>> retries"  # Drain3-masked input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])

Training data

  • In-distribution (train + held-out eval): a publicly available collection of system log corpora (loghub), preprocessed into a level-balanced, stratified sample.
  • Out-of-distribution (evaluation only): a single private industrial Kubernetes log deployment. Not released; used purely as an OOD generalization probe.

Training procedure

Hyperparameter Value
Backbone ModernBERT-base
Loss Weighted Generalized Cross-Entropy (q = 0.7)
Epochs 8
Per-device batch size 32
Gradient accumulation 4 (effective batch 128)
Learning rate 1e-5 (separate LRs for head vs backbone)
Weight decay 0.01
Warmup ratio 0.1
Max sequence length 512
Best-model metric macro F1

Evaluation

In-distribution (held-out stratified slice)

Metric Value
Accuracy 87.37%
Macro precision 0.7368
Macro recall 0.7977
Macro F1 0.7447
Weighted F1 0.8884
Mean confidence (all) 95.57%

On the curated in-distribution slice the WCE sibling is slightly stronger and better-calibrated; this WGCE checkpoint is more confident, the calibration cost of a noise-tolerant objective.

Out-of-distribution

Evaluated on a private industrial Kubernetes domain โ€” a different log distribution than training. Performance degraded modestly but stayed usable, the expected cost of moving to unfamiliar formats. Consistent with its noise-tolerant design, WGCE produced ~21% fewer under-predictions than WCE on this domain. As always for OOD use, validate on your own log distribution.

Limitations and biases

  • OOD generalization โ€” only modest degradation was observed on a single private industrial domain; other distributions are unverified, so validate on your own logs.
  • Confidence โ‰  correctness โ€” this checkpoint is the more confident of the two; treat scores as signals, not guarantees.
  • Preprocessing coupling โ€” inputs must be Drain3-masked exactly as in training (use the log-lens preprocessing pipeline).
Downloads last month
58
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hazemkhaled-94/modernlogbert-gce

Finetuned
(1350)
this model

Evaluation results