ModernLogBERT (WCE)

A ModernBERT encoder fine-tuned to classify the severity level of a single log line into one of six levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.

This checkpoint is trained with a Weighted Cross-Entropy (WCE) objective. A sibling checkpoint trained with Weighted Generalized Cross-Entropy (a noise-tolerant loss) is at hazemkhaled-94/modernlogbert-gce.

Built with the log-lens project, which also provides the Drain3 preprocessing pipeline these inputs require (see "How to use").

Intended use

Intended: triage and observability research — predicting or sanity-checking log severity, and flagging entries whose predicted severity disagrees with the emitted level as candidate anomalies.
Out of scope: a sole source of truth for alerting or incident severity. Aggregate accuracy hides brittle behavior on unfamiliar log formats — keep a human in the loop.

How to use

Inputs must be Drain3-masked the same way as in training (variables replaced by placeholders such as <NUM>, <IP>, <UUID>); raw text degrades predictions. The log-lens repo ships a ready-to-use Drain3 preprocessing pipeline that produces exactly this masked form.

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

repo = "hazemkhaled-94/modernlogbert-wce"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()

text = "Connection refused after <<NUM>> retries"  # Drain3-masked input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])

Training data

In-distribution (train + held-out eval): a publicly available collection of system log corpora (loghub), preprocessed into a level-balanced, stratified sample.
Out-of-distribution (evaluation only): a single private industrial Kubernetes log deployment. Not released; used purely as an OOD generalization probe.

Training procedure

Hyperparameter	Value
Backbone	ModernBERT-base
Loss	Weighted Cross-Entropy
Epochs	8
Per-device batch size	32
Gradient accumulation	4 (effective batch 128)
Learning rate	1e-5 (separate LRs for head vs backbone)
Weight decay	0.01
Warmup ratio	0.1
Max sequence length	512
Best-model metric	macro F1

Evaluation

In-distribution (held-out stratified slice)

Metric	Value
Accuracy	88.18%
Macro precision	0.7647
Macro recall	0.8271
Macro F1	0.7813
Weighted F1	0.8947

On the curated in-distribution slice this WCE checkpoint is the stronger and better-calibrated model (vs the WGCE sibling).

Out-of-distribution

Evaluated on a private industrial Kubernetes domain — a different log distribution than training. Performance degraded modestly but stayed usable, the expected cost of moving to unfamiliar formats. As always for OOD use, validate on your own log distribution before relying on it.

Limitations and biases

OOD generalization — only modest degradation was observed on a single private industrial domain; other distributions are unverified, so validate on your own logs.
Confidence ≠ correctness — treat scores as signals, not guarantees.
Preprocessing coupling — inputs must be Drain3-masked exactly as in training (use the log-lens preprocessing pipeline).

Downloads last month: 66

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for hazemkhaled-94/modernlogbert-wce

Base model

answerdotai/ModernBERT-base

Finetuned

(1350)

this model

Evaluation results

Accuracy (in-distribution)
self-reported

0.882
Macro F1 (in-distribution)
self-reported

0.781