cyber_threat_log_classifier

Overview

This model is a fine-tuned RoBERTa-base classifier designed to analyze raw HTTP server logs and system audit trails for malicious patterns. It identifies common web-based attacks such as SQL Injection and Cross-Site Scripting (XSS) with high precision, enabling real-time security orchestration.

Model Architecture

The model utilizes a Transformer-based encoder architecture (RoBERTa).

  • Encoder: 12-layer Transformer with 768 hidden units and 12 attention heads.
  • Input: Tokenized raw log strings (up to 512 tokens).
  • Classification Head: Linear layer on top of the [CLS] (or equivalent <s>) token pooling to map hidden states to 5 threat categories.

Intended Use

  • SIEM Integration: Automated labeling of incoming logs in Security Information and Event Management systems.
  • Incident Response: Prioritizing security alerts based on the classified threat type.
  • Log Cleaning: Filtering out high-volume benign noise from security dashboards.

Limitations

  • Obfuscated Payloads: Highly encoded or polymorphic attack payloads may bypass detection if not represented in the training distribution.
  • Context Window: Extremely long request bodies or multi-line log events exceeding 512 tokens will be truncated.
  • Adversarial Examples: Sophisticated attackers may craft "log-injection" payloads specifically designed to mislead the classifier.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support