DualBranchEncoder — Context-Aware Flow Embeddings for Encrypted QUIC Traffic

A dual-branch neural encoder that produces 256-dimensional L2-normalized behavioral embeddings of encrypted network flows. Trained from scratch on CESNET-QUIC22 with a margin-based supervised-contrastive objective, it classifies traffic by behavior only — no payload inspection, and crucially no 5-tuple identity (IPs / ports / protocol) is ever fed to the model. Classification is done by nearest class prototype (cosine similarity) on the embeddings, so new traffic types can be added with a few labeled examples and no retraining.

Built for the Samsung EnnovateX AX Hackathon (Problem Statement 2 — Context-Aware Flow Embeddings for Adaptive AI based Network Traffic Classification), Team THETA.

Full source code & live demo: https://github.com/Vikas-ai56/Samsung-ennovatex
Architecture: BiLSTM sequence branch + MLP statistics branch -> cross-attention fusion -> residual projection head -> L2-normalized 256-d embedding
Parameters: ~1.98 M

Why no ports/IPs?

The 5-tuple (src/dst IP, src/dst port, protocol) is a shortcut-leakage feature: a model that learns "port 443 -> streaming" memorizes server topology instead of traffic behavior and collapses on unseen servers / zero-day applications. For QUIC it carries almost no signal anyway (dst port ~= 443, src port ephemeral). Excluding it is what makes the embeddings generalize.

Inputs

Branch	Tensor	Features
Sequence (A)	`(batch, 30, 3)`	per-packet `[size_norm, ipt_norm, direction]` for the first 30 packets
Statistics (B)	`(batch, 16)`	byte/packet ratios, packet-size & inter-packet-time moments (incl. jitter), 8-bin source-size histogram, PPI length — no ports/IPs

Output: (batch, 256), L2-normalized. Use cosine similarity / k-NN / nearest-prototype for classification.

Files

File	Description
`best_model.pth`	Encoder checkpoint (`model_state_dict`, epoch 28, best val acc 0.927)
`prototypes.pth`	`class_id -> 256-d` class-prototype gallery for nearest-prototype classification

Classes

0 video_streaming, 1 audio_streaming, 2 gaming, 3 social_media, 4 file_transfer, 5 browsing, 6 communication. (Audio/music and gaming are held out of training to test zero-day generalization.)

Results (CESNET-QUIC22 validation)

KPI	Target	Result
Classification accuracy	>= 90%	90.90%	PASS
Intra-class cosine similarity	> 0.7	0.7283	PASS
Inference latency (single flow)	< 100 ms	1.36 ms	PASS
Zero-day generalization	>= 85%	84.84%	near
Inter-class cosine similarity	< 0.3	0.3833	near

Usage

The model definition lives in the GitHub repo (src/models_dual_branch.py). Install the repo, then:

import torch
from huggingface_hub import hf_hub_download
from src.models_dual_branch import DualBranchEncoder

repo = "dhruvsinghal1387/dualbranch-quic-encoder"
ckpt_path  = hf_hub_download(repo, "best_model.pth")
proto_path = hf_hub_download(repo, "prototypes.pth")

model = DualBranchEncoder(seq_input_dim=3, stat_input_dim=16, d_model=256, embed_dim=256)
ckpt = torch.load(ckpt_path, map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])
model.eval()

# seq: (B,30,3)  stat: (B,16) — build them with src/feature_engineering.py
emb = model(seq, stat)                      # (B, 256), L2-normalized

# classify by nearest class prototype
protos = torch.load(proto_path, map_location="cpu", weights_only=False)
ids = sorted(protos)
P = torch.nn.functional.normalize(torch.stack([protos[c] for c in ids]), p=2, dim=1)
pred = ids[int((emb @ P.T).argmax(dim=1)[0])]

For a complete real-traffic demo (capture -> classify), see live_demo.py in the GitHub repo.

Intended use & limitations

Intended: QoS-aware traffic-category classification, behavioral flow embedding, few-shot extension to new application types.
Trained on QUIC (CESNET-QUIC22); best on QUIC/UDP-443 traffic. Predictions on TCP/TLS are out-of-distribution and noisier.
Not for user identification, deanonymization, or payload recovery — by design it sees only packet sizes, timings, and directions.

License

Apache-2.0. Training data: CESNET-QUIC22 (Creative Commons).

Downloads last month: -; Downloads are not tracked for this model. How to track