MENTHOS-logparse / README.md
tomlidobnik's picture
Create README.md
a36c4cb verified
---
language:
- en
- sl
library_name: transformers
pipeline_tag: token-classification
tags:
- menthos
- modernbert
- log-parsing
- ner
- cybersecurity
---
# MENTHOS-logparse
## English
### Model Description
MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs.
It uses a maximum sequence length of 256.
### Intended Use
- BIO-style token labeling on log lines.
- Useful for extracting fields like request URL, status, error level, timestamp, etc.
### Label Space
Training code defines BIO labels (plus special ignored/padding handling), including:
- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
- and additional request-header related labels
The full token-label mapping is defined in the training code.
### Training Data
Trained on the MENTHOS log-parsing dataset.
### Benchmark Results
Benchmark results for the MENTHOS evaluation set.
| model | samples | accuracy | precision | recall | f1 | p50 latency (ms) | throughput (samples/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: |
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Reference baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latency (ms) | throughput (samples/s) |
| ------------------------- | -------: | -------: | ---------------: | ---------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
### Benchmark Plots
![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)
![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)
![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)
### Limitations
- Label matching is based on tokenized substring alignment from structured columns.
- Domain shift in log formats can reduce extraction quality.
### Citation
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```
---
## Slovenščina
### Opis modela
MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov.
Uporablja maksimalno dolžino zaporedja 256.
### Namen uporabe
- BIO označevanje tokenov v log vrsticah.
- Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.
### Prostor oznak
Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:
- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
Celotno mapiranje je definirano v učni kodi.
### Učni podatki
Učenje je potekalo na MENTHOS log-parsing datasetu.
### Rezultati benchmarka
| model | vzorcev | accuracy | precision | recall | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: |
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Referenčni baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ------------------------- | -------: | -------: | ---------------: | ----------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
### Grafi benchmarka
![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)
![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)
![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)
### Omejitve
- Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
- Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.
### Citiranje
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```