File size: 4,829 Bytes
a36c4cb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language:
    - en
    - sl
library_name: transformers
pipeline_tag: token-classification
tags:
    - menthos
    - modernbert
    - log-parsing
    - ner
    - cybersecurity
---

# MENTHOS-logparse

## English

### Model Description

MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs.
It uses a maximum sequence length of 256.

### Intended Use

- BIO-style token labeling on log lines.
- Useful for extracting fields like request URL, status, error level, timestamp, etc.

### Label Space

Training code defines BIO labels (plus special ignored/padding handling), including:

- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
- and additional request-header related labels

The full token-label mapping is defined in the training code.

### Training Data

Trained on the MENTHOS log-parsing dataset.

### Benchmark Results

Benchmark results for the MENTHOS evaluation set.

| model            | samples | accuracy | precision |   recall |       f1 | p50 latency (ms) | throughput (samples/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: |
| MENTHOS-logparse |     744 | 0.988710 |  0.949088 | 0.936223 | 0.941009 |          24.2599 |                  43.75 |

Reference baseline (Morpheus ONNX):

| baseline model            | accuracy |       f1 | p50 latency (ms) | throughput (samples/s) |
| ------------------------- | -------: | -------: | ---------------: | ---------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 |         119.8934 |                   8.08 |

### Benchmark Plots

![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)

![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)

![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)

### Limitations

- Label matching is based on tokenized substring alignment from structured columns.
- Domain shift in log formats can reduce extraction quality.

### Citation

```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```

---

## Slovenščina

### Opis modela

MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov.
Uporablja maksimalno dolžino zaporedja 256.

### Namen uporabe

- BIO označevanje tokenov v log vrsticah.
- Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.

### Prostor oznak

Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:

- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`

Celotno mapiranje je definirano v učni kodi.

### Učni podatki

Učenje je potekalo na MENTHOS log-parsing datasetu.

### Rezultati benchmarka

| model            | vzorcev | accuracy | precision |   recall |       f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: |
| MENTHOS-logparse |     744 | 0.988710 |  0.949088 | 0.936223 | 0.941009 |          24.2599 |                   43.75 |

Referenčni baseline (Morpheus ONNX):

| baseline model            | accuracy |       f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ------------------------- | -------: | -------: | ---------------: | ----------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 |         119.8934 |                    8.08 |

### Grafi benchmarka

![LogParsing F1: MENTHOS vs Morpheus](./log-parsing_f1_menthos_vs_morpheus.png)

![LogParsing Throughput: MENTHOS vs Morpheus](./log-parsing_throughput_samples_per_sec_menthos_vs_morpheus.png)

![LogParsing Latency Percentiles](./plots/latency_percentiles_log-parsing.png)

### Omejitve

- Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
- Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.

### Citiranje

```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
  title        = {MENTHOS-logparse},
  author       = {Borovic, Li Dobnik, Kranjec, Ferme},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```