🛡️ CyberHybridNet: Hybrid Transformer for Cybersecurity Anomaly Detection

Architecture

CyberHybridNet is a cutting-edge hybrid transformer architecture designed specifically for network intrusion / anomaly detection in cybersecurity. It combines multiple advanced components:

Key Components:

Multi-Scale 1D CNN Feature Extractor - Captures local patterns at 3 different granularities (kernel sizes 1, 3, 5)
Rotary Position Embeddings (RoPE) - Temporal awareness for network flow sequences
Multi-Head Self-Attention - Global dependency modeling across flow features
Gated Cross-Attention - Cross-feature interaction between CNN and transformer pathways with learned gating
SwiGLU Feed-Forward Networks - Advanced activation function from PaLM/LLaMA
Mixture-of-Experts (MoE) Classifier - 4-expert ensemble with load balancing for robust classification
Focal Loss - Handles severe class imbalance common in cybersecurity datasets
Attention Pooling - Learnable query-based pooling instead of naive mean pooling

Architecture Diagram:

Input Features
      │
  ┌───▼───┐
  │ Input  │
  │Project │
  └───┬───┘
      │
  ┌───▼───────────┐     ┌──────────────────┐
  │ Multi-Scale    │────▶│ CNN Context       │
  │ CNN Extractor  │     │ (3 scales: 1,3,5) │
  └───┬───────────┘     └──────┬───────────┘
      │                        │
      │    ┌───────────────────┘
      │    │
  ┌───▼────▼───────────┐
  │ Hybrid Attention    │ × N layers
  │ ┌─────────────────┐│
  │ │Self-Attn + RoPE ││
  │ ├─────────────────┤│
  │ │Gated Cross-Attn ││
  │ ├─────────────────┤│
  │ │SwiGLU FFN       ││
  │ └─────────────────┘│
  └────────┬───────────┘
           │
  ┌────────▼───────────┐
  │ Attention Pooling   │
  └────────┬───────────┘
           │
  ┌────────▼───────────┐
  │ MoE Classifier     │
  │ (4 experts + gate) │
  └────────┬───────────┘
           │
       Predictions

Performance

CICIDS2017 (Temporal Split)

Metric	Score
Accuracy	77.52%
F1-Macro	74.61%
F1-Weighted	76.14%
Precision	80.75%
Recall	73.83%
AUC-ROC	88.39%

UNSW-NB15

Metric	Score
Accuracy	98.77%
F1-Macro	97.03%
F1-Weighted	98.80%
Precision	95.02%
Recall	99.31%
AUC-ROC	99.94%

Training Details

Optimizer: AdamW (lr=3e-4, weight_decay=1e-4)
Scheduler: Cosine with linear warmup (2 epochs)
Loss: Focal Loss (γ=2.0) with class-weighted sampling
Regularization: Dropout (0.15), gradient clipping (max_norm=1.0), MoE load balancing
Early Stopping: Patience=7 on validation F1-Macro

Usage

import torch
from model import CyberHybridNet

# Load model
model = CyberHybridNet(
    input_dim=78,  # CICIDS2017 features
    num_classes=3,  # BENIGN, ATTACK, UNKNOWN
    hidden_dim=128,
    num_layers=4,
    num_heads=8,
    num_experts=4,
)
model.load_state_dict(torch.load("model.pt"))
model.eval()

# Predict
with torch.no_grad():
    features = torch.randn(1, 78)  # Your preprocessed features
    logits, gate_probs = model(features)
    prediction = logits.argmax(dim=-1)

Datasets

CICIDS2017 - Canadian Institute for Cybersecurity IDS 2017
UNSW-NB15 - Australian Centre for Cyber Security

Downloads last month: 49

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

ha5eeb001
/

CyberHybridNet-anomaly-detector