cyber_threat_log_classifier

Overview

This model is a fine-tuned RoBERTa-base classifier designed to analyze raw HTTP server logs and system audit trails for malicious patterns. It identifies common web-based attacks such as SQL Injection and Cross-Site Scripting (XSS) with high precision, enabling real-time security orchestration.

Model Architecture

The model utilizes a Transformer-based encoder architecture (RoBERTa).

Encoder: 12-layer Transformer with 768 hidden units and 12 attention heads.
Input: Tokenized raw log strings (up to 512 tokens).
Classification Head: Linear layer on top of the [CLS] (or equivalent <s>) token pooling to map hidden states to 5 threat categories.

Intended Use

SIEM Integration: Automated labeling of incoming logs in Security Information and Event Management systems.
Incident Response: Prioritizing security alerts based on the classified threat type.
Log Cleaning: Filtering out high-volume benign noise from security dashboards.

Limitations

Obfuscated Payloads: Highly encoded or polymorphic attack payloads may bypass detection if not represented in the training distribution.
Context Window: Extremely long request bodies or multi-line log events exceeding 512 tokens will be truncated.
Adversarial Examples: Sophisticated attackers may craft "log-injection" payloads specifically designed to mislead the classifier.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support