MENTHOS-logparse

English

Model Description

MENTHOS-LogParsing is a token-classification model fine-tuned from answerdotai/ModernBERT-base for structured field extraction from raw logs. It uses a maximum sequence length of 256.

Intended Use

  • BIO-style token labeling on log lines.
  • Useful for extracting fields like request URL, status, error level, timestamp, etc.

Label Space

Training code defines BIO labels (plus special ignored/padding handling), including:

  • B-request_url, I-request_url
  • B-status
  • B-error_level, I-error_level
  • B-error_message, I-error_message
  • B-time_received, I-time_received
  • B-remote_host, I-remote_host
  • and additional request-header related labels

The full token-label mapping is defined in the training code.

Training Data

Trained on the MENTHOS log-parsing dataset.

Benchmark Results

Benchmark results for the MENTHOS evaluation set.

model samples accuracy precision recall f1 p50 latency (ms) throughput (samples/s)
MENTHOS-logparse 744 0.988710 0.949088 0.936223 0.941009 24.2599 43.75

Reference baseline (Morpheus ONNX):

baseline model accuracy f1 p50 latency (ms) throughput (samples/s)
log-parsing-20220418.onnx 0.984583 0.932764 119.8934 8.08

Benchmark Plots

LogParsing F1: MENTHOS vs Morpheus

LogParsing Throughput: MENTHOS vs Morpheus

LogParsing Latency Percentiles

Limitations

  • Label matching is based on tokenized substring alignment from structured columns.
  • Domain shift in log formats can reduce extraction quality.

Citation

@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}

Slovenščina

Opis modela

MENTHOS-LogParsing je model za token klasifikacijo, naučen iz answerdotai/ModernBERT-base, za ekstrakcijo strukturiranih polj iz surovih log zapisov. Uporablja maksimalno dolžino zaporedja 256.

Namen uporabe

  • BIO označevanje tokenov v log vrsticah.
  • Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.

Prostor oznak

Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:

  • B-request_url, I-request_url
  • B-status
  • B-error_level, I-error_level
  • B-error_message, I-error_message
  • B-time_received, I-time_received
  • B-remote_host, I-remote_host

Celotno mapiranje je definirano v učni kodi.

Učni podatki

Učenje je potekalo na MENTHOS log-parsing datasetu.

Rezultati benchmarka

model vzorcev accuracy precision recall f1 p50 latenca (ms) prepustnost (vzorcev/s)
MENTHOS-logparse 744 0.988710 0.949088 0.936223 0.941009 24.2599 43.75

Referenčni baseline (Morpheus ONNX):

baseline model accuracy f1 p50 latenca (ms) prepustnost (vzorcev/s)
log-parsing-20220418.onnx 0.984583 0.932764 119.8934 8.08

Grafi benchmarka

LogParsing F1: MENTHOS vs Morpheus

LogParsing Throughput: MENTHOS vs Morpheus

LogParsing Latency Percentiles

Omejitve

  • Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
  • Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.

Citiranje

@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including LHRS-UM-FERI/MENTHOS-logparse