LightGBM Models for QUIC Encrypted Traffic Classification

This repository contains eight trained LightGBM model variants for classifying encrypted QUIC network traffic into 17 application categories. The models are part of a research project evaluating the impact of hyperparameter tuning (Optuna) and resampling techniques (SMOTE, RUS) on multiclass traffic classification performance.

Source code: GitHub repository


Dataset

All models are trained on the CESNET-QUIC22 dataset.

  • Source: Zenodo -- CESNET-QUIC22
  • Parent repository: CESNET Liberouter Datasets
  • Capture period: October 31 -- November 6, 2022
  • Protocol: QUIC encrypted traffic flows
  • Split ratio: 60% train / 20% validation / 20% test
  • Preprocessing: Feature engineering on PHIST histograms, StandardScaler normalization

Model Variants

Eight model variants are provided, combining different training data configurations with default or Optuna-optimized hyperparameters.

Model Directory Data Hyperparameters
Baseline saved_model_baseline/ Original Default
Baseline + Optuna saved_model_baseline_optuna/ Original Optuna-optimized
RUS saved_model_0.rus/ RUS-resampled Default
RUS + Optuna saved_model_rus_optuna/ RUS-resampled Optuna-optimized
SMOTE saved_model_0.smote/ SMOTE-resampled Default
SMOTE + Optuna saved_model_smote_optuna/ SMOTE-resampled Optuna-optimized
SMOTE-RUS saved_model_0.smote.rus/ SMOTE-RUS-resampled Default
SMOTE-RUS + Optuna saved_model_smote-rus_optuna/ SMOTE-RUS-resampled Optuna-optimized

Each model directory contains:

  • lightgbm_model.pkl -- serialized LightGBM Booster object
  • lightgbm_model_info.json -- training parameters, feature names, and evaluation metrics

Performance (Test Set)

Model Accuracy Precision Recall F1-Score AUROC AUPRC
Baseline 0.6852 0.6943 0.6852 0.6759 0.9300 0.7630
Baseline + Optuna 0.8417 0.8414 0.8417 0.8400 0.9831 0.9214
RUS 0.6674 0.7014 0.6674 0.6737 0.9362 0.7788
RUS + Optuna 0.8115 0.8263 0.8115 0.8144 0.9800 0.9122
SMOTE 0.7073 0.7146 0.7073 0.7005 0.9413 0.7914
SMOTE + Optuna 0.8382 0.8379 0.8382 0.8369 0.9825 0.9185
SMOTE-RUS 0.6601 0.7024 0.6601 0.6696 0.9355 0.7764
SMOTE-RUS + Optuna 0.8077 0.8250 0.8077 0.8115 0.9796 0.9105

All metrics are weighted averages. AUROC and AUPRC are weighted one-vs-rest.


Class Labels

The models classify traffic into 17 application categories:

Label Category
0 Other services and APIs
1 Streaming media
2 Social
3 Advertising
4 Search
5 Music
6 Authentication services
7 Instant messaging
8 Antivirus
9 File sharing
10 Mail
11 E-commerce
12 Games
13 Analytics and Telemetry
14 Blogs and News
15 Information systems
16 Videoconferencing

Input Features

Each model expects 24 numerical features (StandardScaler-normalized):

Group Features
Flow-level duration, bytes, bytes_rev, packets, packets_rev
Per-Packet Information ppi_len, ppi_duration, ppi_roundtrips
PHIST Source Sizes src_sizes_mean, src_sizes_std, src_sizes_skew, src_sizes_kurt
PHIST Destination Sizes dst_sizes_mean, dst_sizes_std, dst_sizes_skew, dst_sizes_kurt
PHIST Source IPT src_ipt_mean, src_ipt_std, src_ipt_skew, src_ipt_kurt
PHIST Destination IPT dst_ipt_mean, dst_ipt_std, dst_ipt_skew, dst_ipt_kurt

Usage

import pickle
import numpy as np

# Load model
with open("saved_model_baseline_optuna/lightgbm_model.pkl", "rb") as f:
    model = pickle.load(f)

# Prepare input (24 features, StandardScaler-normalized)
X = np.array([[...]])  # shape: (n_samples, 24)

# Predict
predictions_proba = model.predict(X, num_iteration=model.best_iteration)
predictions = np.argmax(predictions_proba, axis=1)

# Map to class names
label_mapping = {
    0: "Other services and APIs",
    1: "Streaming media",
    2: "Social",
    3: "Advertising",
    4: "Search",
    5: "Music",
    6: "Authentication services",
    7: "Instant messaging",
    8: "Antivirus",
    9: "File sharing",
    10: "Mail",
    11: "E-commerce",
    12: "Games",
    13: "Analytics and Telemetry",
    14: "Blogs and News",
    15: "Information systems",
    16: "Videoconferencing",
}

predicted_labels = [label_mapping[p] for p in predictions]

Repository Structure

.
β”œβ”€β”€ README.md
β”œβ”€β”€ label_mapping.csv
β”‚
β”œβ”€β”€ lightgbm_0_baseline_.ipynb            # Training: baseline (default params)
β”œβ”€β”€ lightgbm_0_baseline_rus.ipynb         # Training: baseline with RUS data
β”œβ”€β”€ lightgbm_0_baseline_smote.ipynb       # Training: baseline with SMOTE data
β”œβ”€β”€ lightgbm_0_baseline_smote.rus.ipynb   # Training: baseline with SMOTE-RUS data
β”œβ”€β”€ lightgbm_1_baseline_optuna.ipynb      # Training: Optuna-tuned
β”œβ”€β”€ lightgbm_1_rus_optuna.ipynb           # Training: Optuna-tuned with RUS
β”œβ”€β”€ lightgbm_1_smote_optuna.ipynb         # Training: Optuna-tuned with SMOTE
β”œβ”€β”€ lightgbm_1_smote-rus_optuna.ipynb     # Training: Optuna-tuned with SMOTE-RUS
β”‚
β”œβ”€β”€ optuna_2_optuna_.ipynb                # Optuna hyperparameter search (baseline)
β”œβ”€β”€ optuna_2_optuna_rus.ipynb             # Optuna hyperparameter search (RUS)
β”œβ”€β”€ optuna_2_optuna_smote.ipynb           # Optuna hyperparameter search (SMOTE)
β”œβ”€β”€ optuna_2_optuna_smote-rus.ipynb       # Optuna hyperparameter search (SMOTE-RUS)
β”‚
β”œβ”€β”€ plot_hyperparams.ipynb                # Hyperparameter visualization
β”œβ”€β”€ plot_testing.ipynb                    # Test result visualization
β”‚
β”œβ”€β”€ saved_hyperparams_baseline/           # Optuna trial results (baseline)
β”œβ”€β”€ saved_hyperparams_rus/                # Optuna trial results (RUS)
β”œβ”€β”€ saved_hyperparams_smote/              # Optuna trial results (SMOTE)
β”œβ”€β”€ saved_hyperparams_smote-rus/          # Optuna trial results (SMOTE-RUS)
β”‚
β”œβ”€β”€ saved_model_baseline/                 # Model: baseline
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_baseline_optuna/          # Model: baseline + Optuna
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_0.rus/                    # Model: RUS
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_rus_optuna/               # Model: RUS + Optuna
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_0.smote/                  # Model: SMOTE
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_smote_optuna/             # Model: SMOTE + Optuna
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
β”œβ”€β”€ saved_model_0.smote.rus/              # Model: SMOTE-RUS
β”‚   β”œβ”€β”€ lightgbm_model.pkl
β”‚   └── lightgbm_model_info.json
└── saved_model_smote-rus_optuna/         # Model: SMOTE-RUS + Optuna
    β”œβ”€β”€ lightgbm_model.pkl
    └── lightgbm_model_info.json

Training Configuration

Best-performing model (Baseline + Optuna) hyperparameters:

Parameter Value
objective multiclass
num_class 17
metric multi_logloss
learning_rate 0.0910
num_leaves 416
max_depth 14
min_data_in_leaf 1200
lambda_l1 3
lambda_l2 1
feature_fraction 0.9
is_unbalance true
best_iteration 500

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support