darkknight25 commited on
Commit
6196a0c
·
verified ·
1 Parent(s): 03a18a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +196 -3
README.md CHANGED
@@ -1,3 +1,196 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ - f1
8
+ - precision
9
+ - recall
10
+ - recall
11
+ tags:
12
+ - >-
13
+ xgboost - onnx - cybersecurity - ddos - intrusion-detection - network-security
14
+ - binary-classification
15
+ datasets:
16
+ - CIC-DDoS2019
17
+ pipeline_tag: tabular-classification
18
+ base_model: "null"
19
+ ---
20
+
21
+
22
+ # 🛡️ Model Card: DDoS Detection using XGBoost (ONNX)
23
+
24
+ A high-performance model to detect **DDoS attacks** from network traffic flow data. Trained on [CIC-DDoS2019](https://www.kaggle.com/datasets/dhoogla/cicddos2019), optimized with **Optuna**, and exported to **ONNX** for fast, portable inference.
25
+
26
+ ---
27
+
28
+ ## Model Details
29
+
30
+ ### Model Description
31
+
32
+ - **Developed by:** Sunny Thakur
33
+ - **Model type:** Gradient Boosted Tree (XGBoost)
34
+ - **Language(s):** Not NLP-specific; flow data in numeric format
35
+ - **License:** MIT
36
+ - **Finetuned from model:** None (Trained from scratch)
37
+
38
+ ### Model Sources
39
+
40
+ - **Repository:** https://github.com/SunnyThakur25/DDoS-Detection-XGBoost
41
+ - **Demo:** Coming soon
42
+ - **Paper:** N/A (model based on CIC-DDoS2019 dataset)
43
+
44
+ ---
45
+
46
+ ## Uses
47
+
48
+ ### Direct Use
49
+
50
+ - Detect DDoS attacks from structured flow data (CSV, Parquet, JSONL after transformation)
51
+ - Ideal for cybersecurity monitoring systems, SOC pipelines, or SIEM integrations
52
+
53
+ ### Downstream Use
54
+
55
+ - Can be integrated in larger threat detection systems
56
+ - Extended to multi-class detection or traffic categorization
57
+
58
+ ### Out-of-Scope Use
59
+
60
+ - Real-time packet-level classification without flow aggregation
61
+ - NLP, audio, or image data tasks
62
+
63
+ ---
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ - Model may overfit synthetic DDoS traffic patterns
68
+ - Limited to features available in CIC-DDoS2019
69
+ - SMOTE oversampling may create synthetic minority patterns that don't generalize
70
+
71
+ ### Recommendations
72
+
73
+ - Validate on real-world or updated datasets before deployment
74
+ - Periodic retraining recommended as attack patterns evolve
75
+
76
+ ---
77
+
78
+ ## How to Get Started with the Model
79
+
80
+ ### ONNX Inference (Python)
81
+
82
+ ```python
83
+ import onnxruntime as ort
84
+ import numpy as np
85
+
86
+ session = ort.InferenceSession("ddos_model.onnx")
87
+ input_data = np.array([...], dtype=np.float32).reshape(1, -1)
88
+ outputs = session.run(None, {"input": input_data})
89
+ ```
90
+
91
+
92
+
93
+ ```python
94
+
95
+ import joblib
96
+ pipeline = joblib.load("ddos_detection_pipeline.pkl")
97
+ prediction = pipeline.predict(input_df)
98
+ ```
99
+
100
+
101
+ Training Details
102
+ Training Data
103
+
104
+ Dataset: CIC-DDoS2019
105
+
106
+ Source: https://www.kaggle.com/datasets/dhoogla/cicddos2019
107
+
108
+ Classes: Binary (Benign vs DDoS)
109
+
110
+ Training Procedure
111
+
112
+ Preprocessed: IPs, ports, timestamps dropped
113
+
114
+ Feature engineered: requests_per_sec, pkt_len_variation
115
+
116
+ Balancing: SMOTE (30% oversample minority)
117
+
118
+ Scaler: StandardScaler
119
+
120
+ Model: XGBoost
121
+
122
+ Optimized using Optuna (F1-score, 30 trials)
123
+
124
+ Training Hyperparameters
125
+
126
+ n_estimators: 100–500 (tuned)
127
+
128
+ max_depth: 3–12 (tuned)
129
+
130
+ learning_rate: 0.001–0.2
131
+
132
+ gamma, colsample_bytree, scale_pos_weight: tuned
133
+
134
+ tree_method: hist
135
+
136
+ early_stopping_rounds: 20
137
+
138
+ Evaluation
139
+ Testing Data
140
+
141
+ 20% hold-out split from full data
142
+
143
+ Stratified on class label
144
+
145
+ | Metric | Value |
146
+ | --------- | ------ |
147
+ | Accuracy | 99.98% |
148
+ | F1-Score | 99.98% |
149
+ | AUC-PR | 1.000 |
150
+ | Precision | \~1.00 |
151
+ | Recall | \~1.00 |
152
+
153
+
154
+ Model Examination
155
+ Explainability
156
+
157
+ Integrated SHAP
158
+
159
+ Summary plots identify dominant flow-level features
160
+
161
+ Environmental Impact
162
+
163
+ Hardware: Kaggle Tesla T4 or P100
164
+
165
+ Training Time: < 1 hour
166
+
167
+ Carbon Emitted: Low (can be estimated via ML CO2 Impact)
168
+
169
+ Technical Specifications
170
+
171
+ Architecture: Gradient Boosted Trees (XGBoost)
172
+
173
+ Format: ONNX + .pkl pipeline
174
+
175
+ Dependencies: XGBoost, ONNX, Scikit-learn, Optuna, SHAP
176
+
177
+
178
+
179
+ Citation
180
+ ```java
181
+ @misc{ddos-detection-xgboost-007,
182
+ author = {Sunny Thakur (007)},
183
+ title = {DDoS Detection Model - CICDDoS2019 - XGBoost + ONNX},
184
+ year = {2025},
185
+ url = {https://huggingface.co/darkknight25/ddos_xgboost_onnx}
186
+ }
187
+ ```
188
+ Model Card Authors
189
+
190
+ Sunny Thakur
191
+
192
+ Contact
193
+
194
+ GitHub: SunnyThakur25
195
+
196
+ LinkedIn: Sunny thakur