π‘οΈ CyberHybridNet: Hybrid Transformer for Cybersecurity Anomaly Detection
Architecture
CyberHybridNet is a cutting-edge hybrid transformer architecture designed specifically for network intrusion / anomaly detection in cybersecurity. It combines multiple advanced components:
Key Components:
- Multi-Scale 1D CNN Feature Extractor - Captures local patterns at 3 different granularities (kernel sizes 1, 3, 5)
- Rotary Position Embeddings (RoPE) - Temporal awareness for network flow sequences
- Multi-Head Self-Attention - Global dependency modeling across flow features
- Gated Cross-Attention - Cross-feature interaction between CNN and transformer pathways with learned gating
- SwiGLU Feed-Forward Networks - Advanced activation function from PaLM/LLaMA
- Mixture-of-Experts (MoE) Classifier - 4-expert ensemble with load balancing for robust classification
- Focal Loss - Handles severe class imbalance common in cybersecurity datasets
- Attention Pooling - Learnable query-based pooling instead of naive mean pooling
Architecture Diagram:
Input Features
β
βββββΌββββ
β Input β
βProject β
βββββ¬ββββ
β
βββββΌββββββββββββ ββββββββββββββββββββ
β Multi-Scale ββββββΆβ CNN Context β
β CNN Extractor β β (3 scales: 1,3,5) β
βββββ¬ββββββββββββ ββββββββ¬ββββββββββββ
β β
β βββββββββββββββββββββ
β β
βββββΌβββββΌββββββββββββ
β Hybrid Attention β Γ N layers
β ββββββββββββββββββββ
β βSelf-Attn + RoPE ββ
β βββββββββββββββββββ€β
β βGated Cross-Attn ββ
β βββββββββββββββββββ€β
β βSwiGLU FFN ββ
β ββββββββββββββββββββ
ββββββββββ¬ββββββββββββ
β
ββββββββββΌββββββββββββ
β Attention Pooling β
ββββββββββ¬ββββββββββββ
β
ββββββββββΌββββββββββββ
β MoE Classifier β
β (4 experts + gate) β
ββββββββββ¬ββββββββββββ
β
Predictions
Performance
CICIDS2017 (Temporal Split)
| Metric | Score |
|---|---|
| Accuracy | 77.52% |
| F1-Macro | 74.61% |
| F1-Weighted | 76.14% |
| Precision | 80.75% |
| Recall | 73.83% |
| AUC-ROC | 88.39% |
UNSW-NB15
| Metric | Score |
|---|---|
| Accuracy | 98.77% |
| F1-Macro | 97.03% |
| F1-Weighted | 98.80% |
| Precision | 95.02% |
| Recall | 99.31% |
| AUC-ROC | 99.94% |
Training Details
- Optimizer: AdamW (lr=3e-4, weight_decay=1e-4)
- Scheduler: Cosine with linear warmup (2 epochs)
- Loss: Focal Loss (Ξ³=2.0) with class-weighted sampling
- Regularization: Dropout (0.15), gradient clipping (max_norm=1.0), MoE load balancing
- Early Stopping: Patience=7 on validation F1-Macro
Usage
import torch
from model import CyberHybridNet
# Load model
model = CyberHybridNet(
input_dim=78, # CICIDS2017 features
num_classes=3, # BENIGN, ATTACK, UNKNOWN
hidden_dim=128,
num_layers=4,
num_heads=8,
num_experts=4,
)
model.load_state_dict(torch.load("model.pt"))
model.eval()
# Predict
with torch.no_grad():
features = torch.randn(1, 78) # Your preprocessed features
logits, gate_probs = model(features)
prediction = logits.argmax(dim=-1)
Datasets
- CICIDS2017 - Canadian Institute for Cybersecurity IDS 2017
- UNSW-NB15 - Australian Centre for Cyber Security
- Downloads last month
- 49
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support