Instructions to use hazemkhaled-94/modernlogbert-gce with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hazemkhaled-94/modernlogbert-gce with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="hazemkhaled-94/modernlogbert-gce")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("hazemkhaled-94/modernlogbert-gce") model = AutoModelForSequenceClassification.from_pretrained("hazemkhaled-94/modernlogbert-gce") - Notebooks
- Google Colab
- Kaggle
ModernLogBERT (WGCE)
A ModernBERT encoder
fine-tuned to classify the severity level of a single log line into one of
six levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL.
This checkpoint is trained with a Weighted Generalized Cross-Entropy (WGCE)
objective (q = 0.7) โ a noise-tolerant loss designed to be more robust to
mislabeled training data. A sibling checkpoint trained with plain Weighted
Cross-Entropy is at
hazemkhaled-94/modernlogbert-wce.
Built with the log-lens project, which also provides the Drain3 preprocessing pipeline these inputs require (see "How to use").
Intended use
- Intended: triage and observability research โ predicting or sanity-checking log severity, and flagging entries whose predicted severity disagrees with the emitted level as candidate anomalies.
- Out of scope: a sole source of truth for alerting or incident severity. Aggregate accuracy hides brittle behavior on unfamiliar log formats โ keep a human in the loop.
How to use
Inputs must be Drain3-masked the same way as in training (variables
replaced by placeholders such as <NUM>, <IP>, <UUID>); raw text degrades
predictions. The log-lens
repo ships a ready-to-use Drain3 preprocessing pipeline that produces exactly
this masked form.
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
repo = "hazemkhaled-94/modernlogbert-gce"
tokenizer = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo).eval()
text = "Connection refused after <<NUM>> retries" # Drain3-masked input
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])
Training data
- In-distribution (train + held-out eval): a publicly available collection of system log corpora (loghub), preprocessed into a level-balanced, stratified sample.
- Out-of-distribution (evaluation only): a single private industrial Kubernetes log deployment. Not released; used purely as an OOD generalization probe.
Training procedure
| Hyperparameter | Value |
|---|---|
| Backbone | ModernBERT-base |
| Loss | Weighted Generalized Cross-Entropy (q = 0.7) |
| Epochs | 8 |
| Per-device batch size | 32 |
| Gradient accumulation | 4 (effective batch 128) |
| Learning rate | 1e-5 (separate LRs for head vs backbone) |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Max sequence length | 512 |
| Best-model metric | macro F1 |
Evaluation
In-distribution (held-out stratified slice)
| Metric | Value |
|---|---|
| Accuracy | 87.37% |
| Macro precision | 0.7368 |
| Macro recall | 0.7977 |
| Macro F1 | 0.7447 |
| Weighted F1 | 0.8884 |
| Mean confidence (all) | 95.57% |
On the curated in-distribution slice the WCE sibling is slightly stronger and better-calibrated; this WGCE checkpoint is more confident, the calibration cost of a noise-tolerant objective.
Out-of-distribution
Evaluated on a private industrial Kubernetes domain โ a different log distribution than training. Performance degraded modestly but stayed usable, the expected cost of moving to unfamiliar formats. Consistent with its noise-tolerant design, WGCE produced ~21% fewer under-predictions than WCE on this domain. As always for OOD use, validate on your own log distribution.
Limitations and biases
- OOD generalization โ only modest degradation was observed on a single private industrial domain; other distributions are unverified, so validate on your own logs.
- Confidence โ correctness โ this checkpoint is the more confident of the two; treat scores as signals, not guarantees.
- Preprocessing coupling โ inputs must be Drain3-masked exactly as in training (use the log-lens preprocessing pipeline).
- Downloads last month
- 58
Model tree for hazemkhaled-94/modernlogbert-gce
Base model
answerdotai/ModernBERT-baseEvaluation results
- Accuracy (in-distribution)self-reported0.874
- Macro F1 (in-distribution)self-reported0.745