File size: 4,829 Bytes

a36c4cb

---
language:
    - en
    - sl
library_name: transformers
pipeline_tag: token-classification
tags:
    - menthos
    - modernbert
    - log-parsing
    - ner
    - cybersecurity
---

# MENTHOS-logparse

## English

### Model Description

MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs.
It uses a maximum sequence length of 256.

### Intended Use

- BIO-style token labeling on log lines.
- Useful for extracting fields like request URL, status, error level, timestamp, etc.

### Label Space

Training code defines BIO labels (plus special ignored/padding handling), including:

- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
- and additional request-header related labels

The full token-label mapping is defined in the training code.

### Training Data

Trained on the MENTHOS log-parsing dataset.

### Benchmark Results

Benchmark results for the MENTHOS evaluation set.

| model            | samples | accuracy | precision |   recall |       f1 | p50 latency (ms) | throughput (samples/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: |
| MENTHOS-logparse |     744 | 0.988710 |  0.949088 | 0.936223 | 0.941009 |          24.2599 |                  43.75 |

Reference baseline (Morpheus ONNX):

| baseline model            | accuracy |       f1 | p50 latency (ms) | throughput (samples/s) |
| ------------------------- | -------: | -------: | ---------------: | ---------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 |         119.8934 |                   8.08 |

### Benchmark Plots

![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)

![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)

![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)

### Limitations

- Label matching is based on tokenized substring alignment from structured columns.
- Domain shift in log formats can reduce extraction quality.

### Citation

```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```

---

## Slovenščina

### Opis modela

MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov.
Uporablja maksimalno dolžino zaporedja 256.

### Namen uporabe

- BIO označevanje tokenov v log vrsticah.
- Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.

### Prostor oznak

Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:

- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`

Celotno mapiranje je definirano v učni kodi.

### Učni podatki

Učenje je potekalo na MENTHOS log-parsing datasetu.

### Rezultati benchmarka

| model            | vzorcev | accuracy | precision |   recall |       f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: |
| MENTHOS-logparse |     744 | 0.988710 |  0.949088 | 0.936223 | 0.941009 |          24.2599 |                   43.75 |

Referenčni baseline (Morpheus ONNX):

| baseline model            | accuracy |       f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ------------------------- | -------: | -------: | ---------------: | ----------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 |         119.8934 |                    8.08 |

### Grafi benchmarka

![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)

![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)

![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)

### Omejitve

- Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
- Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.

### Citiranje

```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```