MENTHOS-logparse

English

Model Description

MENTHOS-LogParsing is a token-classification model fine-tuned from answerdotai/ModernBERT-base for structured field extraction from raw logs. It uses a maximum sequence length of 256.

Intended Use

BIO-style token labeling on log lines.
Useful for extracting fields like request URL, status, error level, timestamp, etc.

Label Space

Training code defines BIO labels (plus special ignored/padding handling), including:

B-request_url, I-request_url
B-status
B-error_level, I-error_level
B-error_message, I-error_message
B-time_received, I-time_received
B-remote_host, I-remote_host
and additional request-header related labels

The full token-label mapping is defined in the training code.

Training Data

Trained on the MENTHOS log-parsing dataset.

Benchmark Results

Benchmark results for the MENTHOS evaluation set.

model	samples	accuracy	precision	recall	f1	p50 latency (ms)	throughput (samples/s)
MENTHOS-logparse	744	0.988710	0.949088	0.936223	0.941009	24.2599	43.75

Reference baseline (Morpheus ONNX):

baseline model	accuracy	f1	p50 latency (ms)	throughput (samples/s)
log-parsing-20220418.onnx	0.984583	0.932764	119.8934	8.08

Benchmark Plots

Limitations

Label matching is based on tokenized substring alignment from structured columns.
Domain shift in log formats can reduce extraction quality.

Citation

@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}

Slovenščina

Opis modela

MENTHOS-LogParsing je model za token klasifikacijo, naučen iz answerdotai/ModernBERT-base, za ekstrakcijo strukturiranih polj iz surovih log zapisov. Uporablja maksimalno dolžino zaporedja 256.

Namen uporabe

BIO označevanje tokenov v log vrsticah.
Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.

Prostor oznak

Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:

B-request_url, I-request_url
B-status
B-error_level, I-error_level
B-error_message, I-error_message
B-time_received, I-time_received
B-remote_host, I-remote_host

Celotno mapiranje je definirano v učni kodi.

Učni podatki

Učenje je potekalo na MENTHOS log-parsing datasetu.

Rezultati benchmarka

model	vzorcev	accuracy	precision	recall	f1	p50 latenca (ms)	prepustnost (vzorcev/s)
MENTHOS-logparse	744	0.988710	0.949088	0.936223	0.941009	24.2599	43.75

Referenčni baseline (Morpheus ONNX):

baseline model	accuracy	f1	p50 latenca (ms)	prepustnost (vzorcev/s)
log-parsing-20220418.onnx	0.984583	0.932764	119.8934	8.08

Grafi benchmarka

Omejitve

Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.

Citiranje

@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including LHRS-UM-FERI/MENTHOS-logparse

MENTHOS

Collection

MENTHOS (ModernBERT Embedded Network THreat Operational Suite) - a collection of ModernBERT-based models for various detection tasks in cybersecurity. • 6 items • Updated Apr 20