Kodiak SecOps 1

A fine-tuned language model for automated Security Operations Center (SOC) alert triage. Built on Llama 3.1 8B Instruct and optimized for structured security analysis and decision-making.

Model Description

kodiak-secops-1 is designed to assist security analysts by providing consistent, expert-level triage recommendations for security alerts. It processes alert details including indicators of compromise, user context, asset information, and environmental factors to deliver actionable triage decisions.

Key Features

Feature Description
12 Alert Categories Malware, phishing, brute force, data exfiltration, privilege escalation, lateral movement, C2, insider threat, policy violations, vulnerability exploits, reconnaissance, DoS
5 Triage Decisions Escalate, investigate, monitor, false_positive, close
Structured Output Consistent, parseable response format with decision, priority, reasoning, and actions
Context-Aware Considers user role, asset criticality, and environmental factors
MITRE ATT&CK Aligned Maps alerts to relevant ATT&CK tactics and techniques

Model Details

Property Value
Base Model meta-llama/Llama-3.1-8B-Instruct
Fine-tuning QLoRA (4-bit quantization + LoRA)
LoRA Rank 64
LoRA Alpha 128
Training Data 10,000+ synthetic security alerts
Max Context 4096 tokens

Quick Start

Installation

pip install transformers torch accelerate bitsandbytes peft

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "fmt0816/kodiak-secops-1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    load_in_4bit=True
)

alert = """Analyze the following security alert:

**Alert ID:** ALERT-001
**Category:** malware
**Severity:** high
**Title:** Suspicious executable detected on endpoint

**Description:** A suspicious executable matching known malware patterns was detected.

**Indicators:**
- File hash: abc123def456
- Process: svchost.exe
- Parent: powershell.exe

Provide your triage recommendation."""

messages = [
    {"role": "system", "content": "You are an expert SOC analyst. Analyze alerts and provide structured triage recommendations."},
    {"role": "user", "content": alert}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.3, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using the Python Package

from soc_triage_agent import SOCTriageModel

# Load model with 4-bit quantization
model = SOCTriageModel.from_pretrained("fmt0816/kodiak-secops-1")

# Triage an alert
alert = {
    "alert_id": "ALERT-001",
    "category": "malware",
    "severity": "high",
    "title": "Suspicious executable detected",
    "indicators": {"file_hash": "abc123...", "file_name": "malware.exe"},
    "user_context": {"username": "john.doe", "department": "Engineering", "is_vip": False},
    "asset_context": {"hostname": "WS-PC-001", "criticality": "medium"}
}

prediction = model.predict(alert)
print(f"Decision: {prediction.decision}")
print(f"Priority: {prediction.priority}")
print(f"Confidence: {prediction.confidence:.0%}")
print(f"Actions: {prediction.recommended_actions}")

Using with OpenAI API

from soc_triage_agent import SOCTriageModel
import os

# Set environment variable or pass directly
# export OPENAI_API_KEY=your-api-key

model = SOCTriageModel.from_openai(model_name="gpt-4")
prediction = model.predict(alert)

Using with Azure OpenAI

from soc_triage_agent import SOCTriageModel

# Set environment variables:
# export AZURE_OPENAI_KEY=your-key
# export AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com

model = SOCTriageModel.from_azure_openai(
    deployment_name="soc-triage-deployment"
)
prediction = model.predict(alert)

Supported Alert Categories

Category Description MITRE Tactics
malware Malware detection, ransomware, trojans TA0002, TA0003
phishing Email phishing, BEC, credential harvesting TA0001, TA0043
brute_force Password attacks, credential stuffing TA0006
data_exfiltration Unauthorized data transfers TA0009, TA0010
privilege_escalation Unauthorized privilege elevation TA0004
lateral_movement Attacker movement within network TA0008
command_and_control C2 beaconing, reverse shells TA0011
insider_threat Anomalous user behavior TA0009, TA0010
policy_violation Compliance and policy breaches -
vulnerability_exploit CVE exploitation attempts TA0001, TA0002
reconnaissance Network scanning, enumeration TA0043
denial_of_service DDoS attacks TA0040

Output Format

The model generates structured triage recommendations:

## Triage Recommendation

### Decision Summary
| Field | Value |
|-------|-------|
| **Decision** | escalate |
| **Priority** | 1 |
| **Confidence** | 95% |
| **Escalation Required** | Yes |
| **Escalation Target** | Incident Response Team |
| **Estimated Impact** | high |

### Reasoning
[Detailed explanation of the decision...]

### Key Factors
1. [Factor 1]
2. [Factor 2]

### Recommended Actions
1. [Action 1]
2. [Action 2]

Evaluation Results

Overall Metrics

Metric Value
Decision Accuracy 89.2%
Decision F1 (Macro) 0.872
Decision F1 (Weighted) 0.891
Priority MAE 0.42
Priority Correlation 0.89
Escalation Precision 92.1%
Escalation Recall 88.4%
Escalation F1 0.902

Per-Category Performance

Category Accuracy F1 Score
Malware 91.2% 0.89
Phishing 88.5% 0.86
Brute Force 90.1% 0.88
Data Exfiltration 92.3% 0.91
Lateral Movement 94.5% 0.93
C2 93.1% 0.92

Training Details

Training Data

The model was trained on synthetic security alert data generated using expert-defined triage logic:

  • 10,000+ training examples across 12 alert categories
  • Balanced decision distribution to prevent bias
  • Comprehensive context including user, asset, and environmental factors
  • Expert-level triage decisions based on security best practices

Training Configuration

Parameter Value
Base Model meta-llama/Llama-3.1-8B-Instruct
Fine-tuning Method QLoRA (4-bit + LoRA)
LoRA Rank (r) 64
LoRA Alpha 128
LoRA Dropout 0.05
Learning Rate 2e-5
Epochs 3
Batch Size 16 (with gradient accumulation)
Max Sequence Length 4096
Optimizer AdamW
LR Scheduler Cosine

Reproduce Training

# Clone repository
git clone https://github.com/fmt0816/kodiak-secops-1.git
cd kodiak-secops-1

# Install dependencies
pip install -e ".[train]"

# Generate training data
python -m soc_triage_agent.data_generator \
    --num-samples 10000 \
    --format chat \
    --output data/train.jsonl \
    --balanced

# Train model
python scripts/train.py \
    --model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
    --train_file data/train.jsonl \
    --validation_file data/val.jsonl \
    --output_dir ./outputs/kodiak-secops-1 \
    --use_lora \
    --lora_r 64 \
    --lora_alpha 128 \
    --num_train_epochs 3

Limitations

  • Synthetic Training Data: Model was trained on synthetic data, which may not capture all real-world edge cases
  • Context Dependency: Accuracy depends on completeness of provided alert context
  • No Real-Time Learning: Model does not learn from production feedback without retraining
  • Language: Currently supports English only
  • Hallucination Risk: Like all LLMs, may occasionally generate plausible but incorrect reasoning

Intended Use

Primary Use Cases

  • Assisting SOC analysts with initial alert triage
  • Providing consistent triage recommendations across shifts
  • Reducing alert fatigue and mean time to respond (MTTR)
  • Training and onboarding junior analysts
  • Augmenting understaffed security teams

Out-of-Scope Uses

  • Fully autonomous security decision-making without human oversight
  • Replacing human analysts for critical security decisions
  • Use in safety-critical systems without additional validation
  • Processing classified or highly sensitive data without appropriate controls

Ethical Considerations

  • Human Oversight: This model should augment, not replace, human security analysts
  • Bias Monitoring: Regular evaluation should be conducted to detect and mitigate biases
  • Transparency: Security teams should understand how the model makes decisions
  • Adversarial Robustness: Model outputs should be validated, as adversaries may attempt to manipulate inputs
  • Data Privacy: Ensure alert data processed by the model complies with organizational policies

Technical Specifications

Hardware Requirements

Configuration VRAM Required
4-bit Quantized (Recommended) ~6 GB
8-bit Quantized ~10 GB
Full Precision (FP16) ~16 GB

Software Requirements

  • Python 3.9+
  • PyTorch 2.0+
  • Transformers 4.36+
  • PEFT 0.6+
  • bitsandbytes 0.41+ (for quantization)

Citation

@software{kodiak_secops_1,
  title = {Kodiak SecOps 1: Fine-tuned LLM for Security Alert Triage},
  author = {fmt0816},
  year = {2025},
  url = {https://huggingface.co/fmt0816/kodiak-secops-1},
  note = {Fine-tuned on Llama 3.1 8B Instruct using QLoRA}
}

License

This model is released under the Apache 2.0 License.

Acknowledgments

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fmt0816/kodiak-secops-1

Adapter
(1341)
this model

Dataset used to train fmt0816/kodiak-secops-1

Evaluation results