Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
tags:
|
| 5 |
+
- cybersecurity
|
| 6 |
+
- log-analysis
|
| 7 |
+
- threat-detection
|
| 8 |
+
- roberta
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# cyber_threat_log_classifier
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
This model is a fine-tuned RoBERTa-base classifier designed to analyze raw HTTP server logs and system audit trails for malicious patterns. It identifies common web-based attacks such as SQL Injection and Cross-Site Scripting (XSS) with high precision, enabling real-time security orchestration.
|
| 15 |
+
|
| 16 |
+
## Model Architecture
|
| 17 |
+
The model utilizes a Transformer-based encoder architecture (RoBERTa).
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
- **Encoder:** 12-layer Transformer with 768 hidden units and 12 attention heads.
|
| 22 |
+
- **Input:** Tokenized raw log strings (up to 512 tokens).
|
| 23 |
+
- **Classification Head:** Linear layer on top of the `[CLS]` (or equivalent `<s>`) token pooling to map hidden states to 5 threat categories.
|
| 24 |
+
|
| 25 |
+
## Intended Use
|
| 26 |
+
- **SIEM Integration:** Automated labeling of incoming logs in Security Information and Event Management systems.
|
| 27 |
+
- **Incident Response:** Prioritizing security alerts based on the classified threat type.
|
| 28 |
+
- **Log Cleaning:** Filtering out high-volume benign noise from security dashboards.
|
| 29 |
+
|
| 30 |
+
## Limitations
|
| 31 |
+
- **Obfuscated Payloads:** Highly encoded or polymorphic attack payloads may bypass detection if not represented in the training distribution.
|
| 32 |
+
- **Context Window:** Extremely long request bodies or multi-line log events exceeding 512 tokens will be truncated.
|
| 33 |
+
- **Adversarial Examples:** Sophisticated attackers may craft "log-injection" payloads specifically designed to mislead the classifier.
|