--- language: - en - sl library_name: transformers pipeline_tag: token-classification tags: - menthos - modernbert - log-parsing - ner - cybersecurity --- # MENTHOS-logparse ## English ### Model Description MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs. It uses a maximum sequence length of 256. ### Intended Use - BIO-style token labeling on log lines. - Useful for extracting fields like request URL, status, error level, timestamp, etc. ### Label Space Training code defines BIO labels (plus special ignored/padding handling), including: - `B-request_url`, `I-request_url` - `B-status` - `B-error_level`, `I-error_level` - `B-error_message`, `I-error_message` - `B-time_received`, `I-time_received` - `B-remote_host`, `I-remote_host` - and additional request-header related labels The full token-label mapping is defined in the training code. ### Training Data Trained on the MENTHOS log-parsing dataset. ### Benchmark Results Benchmark results for the MENTHOS evaluation set. | model | samples | accuracy | precision | recall | f1 | p50 latency (ms) | throughput (samples/s) | | ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: | | MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 | Reference baseline (Morpheus ONNX): | baseline model | accuracy | f1 | p50 latency (ms) | throughput (samples/s) | | ------------------------- | -------: | -------: | ---------------: | ---------------------: | | log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 | ### Benchmark Plots ![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png) ![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png) ![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png) ### Limitations - Label matching is based on tokenized substring alignment from structured columns. - Domain shift in log formats can reduce extraction quality. ### Citation ``` @misc{borovic_li-dobnik_kranjec_ferme_2026, title = {MENTHOS-logparse}, author = {Borovic, Li Dobnik, Kranjec, Ferme}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}} } ``` --- ## Slovenščina ### Opis modela MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov. Uporablja maksimalno dolžino zaporedja 256. ### Namen uporabe - BIO označevanje tokenov v log vrsticah. - Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd. ### Prostor oznak Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.: - `B-request_url`, `I-request_url` - `B-status` - `B-error_level`, `I-error_level` - `B-error_message`, `I-error_message` - `B-time_received`, `I-time_received` - `B-remote_host`, `I-remote_host` Celotno mapiranje je definirano v učni kodi. ### Učni podatki Učenje je potekalo na MENTHOS log-parsing datasetu. ### Rezultati benchmarka | model | vzorcev | accuracy | precision | recall | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) | | ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: | | MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 | Referenčni baseline (Morpheus ONNX): | baseline model | accuracy | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) | | ------------------------- | -------: | -------: | ---------------: | ----------------------: | | log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 | ### Grafi benchmarka ![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png) ![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png) ![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png) ### Omejitve - Ujemanje oznak temelji na poravnavi tokeniziranih podnizov. - Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša. ### Citiranje ``` @misc{borovic_li-dobnik_kranjec_ferme_2026, title = {MENTHOS-logparse}, author = {Borovic, Li Dobnik, Kranjec, Ferme}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}} } ```