MENTHOS-logparse
English
Model Description
MENTHOS-LogParsing is a token-classification model fine-tuned from answerdotai/ModernBERT-base for structured field extraction from raw logs.
It uses a maximum sequence length of 256.
Intended Use
- BIO-style token labeling on log lines.
- Useful for extracting fields like request URL, status, error level, timestamp, etc.
Label Space
Training code defines BIO labels (plus special ignored/padding handling), including:
B-request_url,I-request_urlB-statusB-error_level,I-error_levelB-error_message,I-error_messageB-time_received,I-time_receivedB-remote_host,I-remote_host- and additional request-header related labels
The full token-label mapping is defined in the training code.
Training Data
Trained on the MENTHOS log-parsing dataset.
Benchmark Results
Benchmark results for the MENTHOS evaluation set.
| model | samples | accuracy | precision | recall | f1 | p50 latency (ms) | throughput (samples/s) |
|---|---|---|---|---|---|---|---|
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Reference baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latency (ms) | throughput (samples/s) |
|---|---|---|---|---|
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
Benchmark Plots
Limitations
- Label matching is based on tokenized substring alignment from structured columns.
- Domain shift in log formats can reduce extraction quality.
Citation
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
Slovenščina
Opis modela
MENTHOS-LogParsing je model za token klasifikacijo, naučen iz answerdotai/ModernBERT-base, za ekstrakcijo strukturiranih polj iz surovih log zapisov.
Uporablja maksimalno dolžino zaporedja 256.
Namen uporabe
- BIO označevanje tokenov v log vrsticah.
- Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.
Prostor oznak
Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:
B-request_url,I-request_urlB-statusB-error_level,I-error_levelB-error_message,I-error_messageB-time_received,I-time_receivedB-remote_host,I-remote_host
Celotno mapiranje je definirano v učni kodi.
Učni podatki
Učenje je potekalo na MENTHOS log-parsing datasetu.
Rezultati benchmarka
| model | vzorcev | accuracy | precision | recall | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
|---|---|---|---|---|---|---|---|
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Referenčni baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
|---|---|---|---|---|
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
Grafi benchmarka
Omejitve
- Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
- Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.
Citiranje
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}


