πŸ›‘οΈ CyberHybridNet: Hybrid Transformer for Cybersecurity Anomaly Detection

Architecture

CyberHybridNet is a cutting-edge hybrid transformer architecture designed specifically for network intrusion / anomaly detection in cybersecurity. It combines multiple advanced components:

Key Components:

  1. Multi-Scale 1D CNN Feature Extractor - Captures local patterns at 3 different granularities (kernel sizes 1, 3, 5)
  2. Rotary Position Embeddings (RoPE) - Temporal awareness for network flow sequences
  3. Multi-Head Self-Attention - Global dependency modeling across flow features
  4. Gated Cross-Attention - Cross-feature interaction between CNN and transformer pathways with learned gating
  5. SwiGLU Feed-Forward Networks - Advanced activation function from PaLM/LLaMA
  6. Mixture-of-Experts (MoE) Classifier - 4-expert ensemble with load balancing for robust classification
  7. Focal Loss - Handles severe class imbalance common in cybersecurity datasets
  8. Attention Pooling - Learnable query-based pooling instead of naive mean pooling

Architecture Diagram:

Input Features
      β”‚
  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
  β”‚ Input  β”‚
  β”‚Project β”‚
  β””β”€β”€β”€β”¬β”€β”€β”€β”˜
      β”‚
  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Multi-Scale    │────▢│ CNN Context       β”‚
  β”‚ CNN Extractor  β”‚     β”‚ (3 scales: 1,3,5) β”‚
  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                        β”‚
      β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚    β”‚
  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Hybrid Attention    β”‚ Γ— N layers
  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
  β”‚ β”‚Self-Attn + RoPE β”‚β”‚
  β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚
  β”‚ β”‚Gated Cross-Attn β”‚β”‚
  β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚
  β”‚ β”‚SwiGLU FFN       β”‚β”‚
  β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Attention Pooling   β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ MoE Classifier     β”‚
  β”‚ (4 experts + gate) β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
       Predictions

Performance

CICIDS2017 (Temporal Split)

Metric Score
Accuracy 77.52%
F1-Macro 74.61%
F1-Weighted 76.14%
Precision 80.75%
Recall 73.83%
AUC-ROC 88.39%

UNSW-NB15

Metric Score
Accuracy 98.77%
F1-Macro 97.03%
F1-Weighted 98.80%
Precision 95.02%
Recall 99.31%
AUC-ROC 99.94%

Training Details

  • Optimizer: AdamW (lr=3e-4, weight_decay=1e-4)
  • Scheduler: Cosine with linear warmup (2 epochs)
  • Loss: Focal Loss (Ξ³=2.0) with class-weighted sampling
  • Regularization: Dropout (0.15), gradient clipping (max_norm=1.0), MoE load balancing
  • Early Stopping: Patience=7 on validation F1-Macro

Usage

import torch
from model import CyberHybridNet

# Load model
model = CyberHybridNet(
    input_dim=78,  # CICIDS2017 features
    num_classes=3,  # BENIGN, ATTACK, UNKNOWN
    hidden_dim=128,
    num_layers=4,
    num_heads=8,
    num_experts=4,
)
model.load_state_dict(torch.load("model.pt"))
model.eval()

# Predict
with torch.no_grad():
    features = torch.randn(1, 78)  # Your preprocessed features
    logits, gate_probs = model(features)
    prediction = logits.argmax(dim=-1)

Datasets

  • CICIDS2017 - Canadian Institute for Cybersecurity IDS 2017
  • UNSW-NB15 - Australian Centre for Cyber Security
Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train ha5eeb001/CyberHybridNet-anomaly-detector