Token Classification
Transformers
Safetensors
English
Slovenian
menthos
modernbert
log-parsing
ner
cybersecurity
Instructions to use LHRS-UM-FERI/MENTHOS-logparse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LHRS-UM-FERI/MENTHOS-logparse with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="LHRS-UM-FERI/MENTHOS-logparse")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LHRS-UM-FERI/MENTHOS-logparse", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| - sl | |
| library_name: transformers | |
| pipeline_tag: token-classification | |
| tags: | |
| - menthos | |
| - modernbert | |
| - log-parsing | |
| - ner | |
| - cybersecurity | |
| # MENTHOS-logparse | |
| ## English | |
| ### Model Description | |
| MENTHOS-LogParsing is a token-classification model fine-tuned from `answerdotai/ModernBERT-base` for structured field extraction from raw logs. | |
| It uses a maximum sequence length of 256. | |
| ### Intended Use | |
| - BIO-style token labeling on log lines. | |
| - Useful for extracting fields like request URL, status, error level, timestamp, etc. | |
| ### Label Space | |
| Training code defines BIO labels (plus special ignored/padding handling), including: | |
| - `B-request_url`, `I-request_url` | |
| - `B-status` | |
| - `B-error_level`, `I-error_level` | |
| - `B-error_message`, `I-error_message` | |
| - `B-time_received`, `I-time_received` | |
| - `B-remote_host`, `I-remote_host` | |
| - and additional request-header related labels | |
| The full token-label mapping is defined in the training code. | |
| ### Training Data | |
| Trained on the MENTHOS log-parsing dataset. | |
| ### Benchmark Results | |
| Benchmark results for the MENTHOS evaluation set. | |
| | model | samples | accuracy | precision | recall | f1 | p50 latency (ms) | throughput (samples/s) | | |
| | ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ---------------------: | | |
| | MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 | | |
| Reference baseline (Morpheus ONNX): | |
| | baseline model | accuracy | f1 | p50 latency (ms) | throughput (samples/s) | | |
| | ------------------------- | -------: | -------: | ---------------: | ---------------------: | | |
| | log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 | | |
| ### Benchmark Plots | |
|  | |
|  | |
|  | |
| ### Limitations | |
| - Label matching is based on tokenized substring alignment from structured columns. | |
| - Domain shift in log formats can reduce extraction quality. | |
| ### Citation | |
| ``` | |
| @misc{borovic_li-dobnik_kranjec_ferme_2026, | |
| title = {MENTHOS-logparse}, | |
| author = {Borovic, Li Dobnik, Kranjec, Ferme}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}} | |
| } | |
| ``` | |
| --- | |
| ## Slovenščina | |
| ### Opis modela | |
| MENTHOS-LogParsing je model za token klasifikacijo, naučen iz `answerdotai/ModernBERT-base`, za ekstrakcijo strukturiranih polj iz surovih log zapisov. | |
| Uporablja maksimalno dolžino zaporedja 256. | |
| ### Namen uporabe | |
| - BIO označevanje tokenov v log vrsticah. | |
| - Uporabno za polja kot so URL zahteve, status, error level, časovni žig ipd. | |
| ### Prostor oznak | |
| Skripta učenja definira BIO oznake (ter posebne oznake za ignoriranje/padding), npr.: | |
| - `B-request_url`, `I-request_url` | |
| - `B-status` | |
| - `B-error_level`, `I-error_level` | |
| - `B-error_message`, `I-error_message` | |
| - `B-time_received`, `I-time_received` | |
| - `B-remote_host`, `I-remote_host` | |
| Celotno mapiranje je definirano v učni kodi. | |
| ### Učni podatki | |
| Učenje je potekalo na MENTHOS log-parsing datasetu. | |
| ### Rezultati benchmarka | |
| | model | vzorcev | accuracy | precision | recall | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) | | |
| | ---------------- | ------: | -------: | --------: | -------: | -------: | ---------------: | ----------------------: | | |
| | MENTHOS-logparse | 744 | 0.988710 | 0.949088 | 0.936223 | 0.941009 | 24.2599 | 43.75 | | |
| Referenčni baseline (Morpheus ONNX): | |
| | baseline model | accuracy | f1 | p50 latenca (ms) | prepustnost (vzorcev/s) | | |
| | ------------------------- | -------: | -------: | ---------------: | ----------------------: | | |
| | log-parsing-20220418.onnx | 0.984583 | 0.932764 | 119.8934 | 8.08 | | |
| ### Grafi benchmarka | |
|  | |
|  | |
|  | |
| ### Omejitve | |
| - Ujemanje oznak temelji na poravnavi tokeniziranih podnizov. | |
| - Pri drugačnih log formatih se lahko kakovost ekstrakcije zmanjša. | |
| ### Citiranje | |
| ``` | |
| @misc{borovic_li-dobnik_kranjec_ferme_2026, | |
| title = {MENTHOS-logparse}, | |
| author = {Borovic, Li Dobnik, Kranjec, Ferme}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/LHRS-UM-FERI/MENTHOS-logparse}} | |
| } | |
| ``` | |