Token Classification
Transformers
Safetensors
English
Slovenian
menthos
modernbert
log-parsing
ner
cybersecurity
Instructions to use LHRS-UM-FERI/MENTHOS-logparse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LHRS-UM-FERI/MENTHOS-logparse with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="LHRS-UM-FERI/MENTHOS-logparse")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LHRS-UM-FERI/MENTHOS-logparse", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 4,829 Bytes
a36c4cb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | ---
language:
- en
- sl
library_name: transformers
pipeline_tag: token-classification
tags:
- menthos
- modernbert
- log-parsing
- ner
- cybersecurity
---
# MENTHOS-logparse
## English
### Model Description
MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs.
It uses a maximum sequence length of 256.
### Intended Use
- BIO-style token labeling on log lines.
- Useful for extracting fields like request URL, status, error level, timestamp, etc.
### Label Space
Training code defines BIO labels (plus special ignored/padding handling), including:
- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
- and additional request-header related labels
The full token-label mapping is defined in the training code.
### Training Data
Trained on the MENTHOS log-parsing dataset.
### Benchmark Results
Benchmark results for the MENTHOS evaluation set.
| model | samples | accuracy | precision | recall | f1 | p50 latency (ms) | throughput (samples/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: |
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Reference baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latency (ms) | throughput (samples/s) |
| ------------------------- | -------: | -------: | ---------------: | ---------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
### Benchmark Plots



### Limitations
- Label matching is based on tokenized substring alignment from structured columns.
- Domain shift in log formats can reduce extraction quality.
### Citation
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```
---
## Slovenščina
### Opis modela
MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov.
Uporablja maksimalno dolžino zaporedja 256.
### Namen uporabe
- BIO označevanje tokenov v log vrsticah.
- Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd.
### Prostor oznak
Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.:
- `B-request_url`, `I-request_url`
- `B-status`
- `B-error_level`, `I-error_level`
- `B-error_message`, `I-error_message`
- `B-time_received`, `I-time_received`
- `B-remote_host`, `I-remote_host`
Celotno mapiranje je definirano v učni kodi.
### Učni podatki
Učenje je potekalo na MENTHOS log-parsing datasetu.
### Rezultati benchmarka
| model | vzorcev | accuracy | precision | recall | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: |
| MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 |
Referenčni baseline (Morpheus ONNX):
| baseline model | accuracy | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) |
| ------------------------- | -------: | -------: | ---------------: | ----------------------: |
| log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 |
### Grafi benchmarka



### Omejitve
- Ujemanje oznak temelji na poravnavi tokeniziranih podnizov.
- Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša.
### Citiranje
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-logparse},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}}
}
```
|