Papers
arxiv:2506.17512

Semantic-Aware Parsing for Security Logs

Published on Nov 14, 2025
Authors:
,
,
,
,
,
,

Abstract

Matryoshka is an end-to-end system that uses large language models to automatically generate semantically-aware structured log parsers from unlabeled samples, eliminating the need for manual parser construction and enabling scalable security analytics.

Security logs are foundational to threat detection and post-incident investigation, yet analysts often struggle to fully leverage them due to their heterogeneity and unstructured nature. The standard practice of manually writing parsers to normalize the data in security event management systems is time-consuming and costly due to the long tail of log formats. Meanwhile, querying raw logs without explicit parsing using large language models (LLMs) is impractical at scale. In this paper, we introduce Matryoshka, an end-to-end system leveraging LLMs to automatically generate semantically-aware structured log parsers without labeled examples or human intervention. Matryoshka achieves this by directly inferring log syntax, variable naming, and normalization to common security-specific schemas (e.g., OCSF [1]) from unlabeled log line samples, then generating deterministic parsers and mapping rules that can be efficiently applied during data ingest. This approach provides analysts with semantically-rich data representations at scale, facilitating rapid and precise log search without the traditional burden of manual parser construction. We evaluate Matryoshka's capabilities through both established template generation datasets and new datasets curated to establish end-to-end performance on a realistic distribution of log types. Our experiments show that Matryoshka outperforms prior work on syntax parsing while matching human-generated parsers in both side-by-side comparisons and retrieval for security-relevant queries. These results demonstrate that Matryoshka significantly reduces manual effort by automatically extracting and organizing valuable security data, moving us closer to fully automated, AI-driven analytics.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2506.17512
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.17512 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.17512 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.17512 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.