danielostrow commited on
Commit
8de1405
·
verified ·
1 Parent(s): 9cb214b

Clean up README - remove training section, update file list

Browse files
Files changed (1) hide show
  1. README.md +21 -428
README.md CHANGED
@@ -1,441 +1,34 @@
1
  ---
 
 
 
 
 
 
2
  license: mit
3
- language:
4
- - en
5
- library_name: pytorch
6
- tags:
7
- - security
8
- - cybersecurity
9
- - network-security
10
- - c2-detection
11
- - beacon-detection
12
- - threat-detection
13
- - malware-detection
14
- - logbert
15
- - transformer
16
- - safetensors
17
- pipeline_tag: other
18
  ---
19
 
20
- # C2Sentinel
21
 
22
- [![Downloads](https://img.shields.io/badge/dynamic/json?url=https://huggingface.co/api/models/danielostrow/c2sentinel&query=downloads&label=Downloads&color=blue)](https://huggingface.co/danielostrow/c2sentinel)
23
- [![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
24
- [![Demo](https://img.shields.io/badge/Demo-Hugging%20Face%20Spaces-yellow)](https://huggingface.co/spaces/danielostrow/c2sentinel)
25
 
26
- A machine learning model for detecting Command and Control (C2) beacon communications in network traffic. Built on a fine-tuned [LogBERT](https://arxiv.org/abs/2103.04475) transformer architecture.
27
 
28
- **Author:** Daniel Ostrow
29
- **Website:** [neuralintellect.com](https://neuralintellect.com)
30
- **Release Date:** January 18, 2026
31
-
32
- ---
33
-
34
- ## Base Model
35
-
36
- This model is fine-tuned from the LogBERT architecture for log anomaly detection.
37
-
38
- - **Paper:** [LogBERT: Log Anomaly Detection via BERT](https://arxiv.org/abs/2103.04475) (Guo, Yuan, Wu - IJCNN 2021)
39
- - **Original Implementation:** [github.com/HelenGuohx/logbert](https://github.com/HelenGuohx/logbert)
40
-
41
- ---
42
-
43
- ## Overview
44
-
45
- C2Sentinel analyzes network connection patterns to identify C2 beacon activity. The model uses behavioral analysis rather than port-based filtering, enabling detection of C2 communications on any port. This approach catches C2 activity regardless of whether attackers use expected ports (4444) or attempt to blend in on common ports (443, 80, 53).
46
-
47
- ### Capabilities
48
-
49
- - Detection of 34+ C2 framework behavioral patterns across all ports
50
- - Slow beacon detection (intervals from seconds to hours)
51
- - Legitimate traffic pattern recognition (SSH keepalive, health checks, database connections)
52
- - Optional context enrichment (process information, reputation scores, threat intelligence)
53
- - IP reconnaissance and IOC generation
54
- - Safetensors format for secure model loading
55
-
56
- ---
57
-
58
- ## Installation
59
-
60
- ```bash
61
- pip install torch numpy safetensors huggingface_hub
62
- ```
63
-
64
- ---
65
 
66
  ## Usage
67
 
68
- ### Loading from HuggingFace Hub
69
-
70
- ```python
71
- from c2sentinel import C2Sentinel
72
-
73
- sentinel = C2Sentinel.from_pretrained('danielostrow/c2sentinel')
74
- ```
75
-
76
- ### Loading from Local Files
77
-
78
- ```python
79
- from c2sentinel import C2Sentinel
80
-
81
- sentinel = C2Sentinel.load('c2_sentinel')
82
- ```
83
-
84
- ### Analyzing Connections
85
-
86
- ```python
87
- connections = [
88
- {
89
- 'timestamp': 1000000,
90
- 'dst_ip': '10.0.0.1',
91
- 'dst_port': 443,
92
- 'bytes_sent': 200,
93
- 'bytes_recv': 500
94
- },
95
- {
96
- 'timestamp': 1000060,
97
- 'dst_ip': '10.0.0.1',
98
- 'dst_port': 443,
99
- 'bytes_sent': 200,
100
- 'bytes_recv': 500
101
- },
102
- ]
103
-
104
- result = sentinel.analyze(connections)
105
-
106
- if result.is_c2:
107
- print(f"C2 detected: {result.c2_type}")
108
- print(f"Probability: {result.c2_probability}")
109
- else:
110
- print("No C2 detected")
111
- ```
112
-
113
- ---
114
-
115
- ## Connection Record Format
116
-
117
- | Field | Type | Required | Description |
118
- |-------|------|----------|-------------|
119
- | `timestamp` | float | Yes | Unix timestamp |
120
- | `dst_ip` | str | Yes | Destination IP address |
121
- | `dst_port` | int | Yes | Destination port |
122
- | `bytes_sent` | int | Yes | Bytes sent |
123
- | `bytes_recv` | int | Yes | Bytes received |
124
- | `src_ip` | str | No | Source IP address |
125
- | `src_port` | int | No | Source port |
126
- | `protocol` | str | No | Protocol (tcp/udp) |
127
- | `duration` | float | No | Connection duration in seconds |
128
-
129
- ---
130
-
131
- ## Analysis Options
132
-
133
- ### Threshold
134
-
135
- ```python
136
- # Default threshold (0.5)
137
- result = sentinel.analyze(connections)
138
-
139
- # Lower threshold for higher sensitivity
140
- result = sentinel.analyze(connections, threshold=0.3)
141
-
142
- # Higher threshold for higher precision
143
- result = sentinel.analyze(connections, threshold=0.7)
144
-
145
- # Strict mode enforces minimum 0.7 threshold
146
- result = sentinel.analyze(connections, strict_mode=True)
147
- ```
148
-
149
- ### Context
150
-
151
- ```python
152
- from c2sentinel import ConnectionContext
153
-
154
- context = ConnectionContext(
155
- process_name='sshd',
156
- known_good=True,
157
- ip_reputation=0.95,
158
- dns_queries=['api.example.com']
159
- )
160
-
161
- result = sentinel.analyze(connections, context=context)
162
- ```
163
-
164
- ### Whitelist and Blacklist
165
-
166
- ```python
167
- sentinel.add_whitelist(ips=['8.8.8.8'], domains=['google.com'])
168
- sentinel.add_blacklist(ips=['10.10.10.10'], domains=['malware.example'])
169
- ```
170
-
171
- ---
172
-
173
- ## Result Object
174
-
175
- The `AnalysisResult` object contains:
176
-
177
- | Attribute | Type | Description |
178
- |-----------|------|-------------|
179
- | `is_c2` | bool | True if C2 detected |
180
- | `c2_probability` | float | Probability score (0.0-1.0) |
181
- | `c2_type` | str | Detected C2 framework type |
182
- | `confidence` | float | Model confidence |
183
- | `detection_method` | str | Method used (signature/ml/context/whitelist) |
184
- | `immediate_detection` | bool | True if signature-based |
185
- | `risk_factors` | list | Factors supporting C2 classification |
186
- | `mitigating_factors` | list | Factors against C2 classification |
187
- | `matched_legitimate_pattern` | str | Matched legitimate pattern name |
188
- | `service_type` | str | Detected service type |
189
- | `recommendations` | list | Suggested actions |
190
-
191
- ---
192
-
193
- ## Batch Analysis
194
-
195
- ```python
196
- connection_groups = [
197
- [conn1, conn2, conn3],
198
- [conn4, conn5, conn6],
199
- ]
200
-
201
- results = sentinel.analyze_batch(connection_groups)
202
- ```
203
-
204
- ---
205
-
206
- ## Log File Parsing
207
-
208
- ```python
209
- with open('conn.log', 'r') as f:
210
- log_lines = f.readlines()
211
-
212
- results = sentinel.analyze_logs(log_lines, group_by_dst=True)
213
- ```
214
-
215
- Supported formats: JSON, Zeek conn.log, syslog
216
-
217
- ---
218
-
219
- ## Reconnaissance
220
-
221
- ### IP Analysis
222
-
223
- ```python
224
- info = sentinel.recon.analyze_ip('104.16.132.229')
225
- # Returns: is_valid, is_private, is_cdn, cdn_provider, reverse_dns
226
- ```
227
-
228
- ### Pattern Analysis
229
-
230
- ```python
231
- patterns = sentinel.recon.analyze_connection_patterns(connections)
232
- # Returns: timing stats, volume stats, behavioral indicators
233
- ```
234
-
235
- ### IOC Generation
236
-
237
- ```python
238
- if result.is_c2:
239
- iocs = sentinel.recon.generate_iocs(connections, result.to_dict())
240
- # Returns: ips, ports, timing_signatures, behavioral_indicators
241
- ```
242
-
243
- ---
244
-
245
- ## Detection Methodology
246
-
247
- ### C2 Indicators
248
-
249
- - Consistent beacon intervals (low timing variance)
250
- - Consistent packet sizes (low size variance)
251
- - Single persistent destination
252
- - Balanced request/response ratio
253
-
254
- ### Signature Detection
255
-
256
- Immediate detection for high-confidence C2 ports with matching behavioral patterns:
257
- - Port 4444 (Metasploit default)
258
- - Port 5555 (Metasploit alternative)
259
- - Port 31337 (Sliver)
260
- - Port 40056 (Havoc)
261
-
262
- ### Legitimate Traffic Indicators
263
-
264
- - High response size variance
265
- - Asymmetric traffic patterns (small requests, large responses)
266
- - Multiple destinations
267
- - SSH keepalive patterns (small symmetric packets on port 22)
268
- - Health check patterns (regular intervals, variable response sizes)
269
-
270
- ---
271
-
272
- ## Model Specifications
273
-
274
- | Specification | Value |
275
- |---------------|-------|
276
- | Architecture | LogBERT Transformer |
277
- | Parameters | 4.9 million |
278
- | Feature Dimensions | 40 |
279
- | Encoder Layers | 6 |
280
- | Attention Heads | 8 |
281
- | Hidden Dimension | 256 |
282
- | Format | Safetensors |
283
- | Size | 20 MB |
284
-
285
- ---
286
-
287
- ## Training Your Own Model
288
-
289
- C2Sentinel supports training custom weights on your own data. This is useful for:
290
- - Fine-tuning on your network's specific traffic patterns
291
- - Adding detection for new C2 frameworks
292
- - Reducing false positives in your environment
293
-
294
- ### Prerequisites
295
-
296
- ```bash
297
- pip install torch numpy safetensors tqdm packaging
298
- ```
299
-
300
- ### Using Pre-trained Weights
301
-
302
- The released weights are trained on synthetic C2 beacon patterns covering 10+ framework types:
303
-
304
- ```python
305
- from c2sentinel import C2Sentinel
306
-
307
- # Load pre-trained weights from HuggingFace
308
- sentinel = C2Sentinel.from_pretrained('danielostrow/c2sentinel')
309
-
310
- # Or load from local files
311
- sentinel = C2Sentinel.load('c2_sentinel')
312
- ```
313
-
314
- ### Training From Scratch
315
-
316
- Use the provided training script to train on synthetic data:
317
-
318
- ```bash
319
- # Basic training (20,000 samples, 100 epochs)
320
- python train_model.py --epochs 100 --samples 20000
321
-
322
- # Faster training with fewer samples
323
- python train_model.py --epochs 50 --samples 10000
324
-
325
- # Custom learning rate
326
- python train_model.py --epochs 100 --samples 25000 --lr 0.0001
327
- ```
328
-
329
- ### Training on Custom Data
330
-
331
- Create a custom dataset class that returns connection records:
332
-
333
- ```python
334
- from torch.utils.data import Dataset
335
- from c2sentinel import FeatureExtractor
336
-
337
- class CustomC2Dataset(Dataset):
338
- def __init__(self, labeled_connections):
339
- self.feature_extractor = FeatureExtractor()
340
- self.samples = []
341
- self.labels = []
342
-
343
- for connections, is_c2 in labeled_connections:
344
- features = self.feature_extractor.extract_features(connections)
345
- self.samples.append(features)
346
- self.labels.append(1 if is_c2 else 0)
347
-
348
- # Normalize features (critical for training stability)
349
- self.samples = np.array(self.samples, dtype=np.float32)
350
- self.mean = np.mean(self.samples, axis=0)
351
- self.std = np.std(self.samples, axis=0) + 1e-8
352
- self.samples = (self.samples - self.mean) / self.std
353
-
354
- def __len__(self):
355
- return len(self.samples)
356
-
357
- def __getitem__(self, idx):
358
- return {
359
- 'features': torch.tensor(self.samples[idx]),
360
- 'label': torch.tensor(self.labels[idx], dtype=torch.float32)
361
- }
362
- ```
363
-
364
- ### Fine-tuning Pre-trained Weights
365
-
366
- Start from pre-trained weights and fine-tune on your data:
367
-
368
- ```python
369
- from c2sentinel import LogBERTC2Sentinel, C2SentinelConfig
370
- from safetensors.torch import load_file, save_file
371
- import torch.optim as optim
372
-
373
- # Load pre-trained model
374
- config = C2SentinelConfig()
375
- model = LogBERTC2Sentinel(config)
376
- state_dict = load_file('c2_sentinel.safetensors')
377
- model.load_state_dict(state_dict)
378
-
379
- # Fine-tune with lower learning rate
380
- optimizer = optim.AdamW(model.parameters(), lr=0.00005, weight_decay=0.01)
381
-
382
- # Train on your data...
383
-
384
- # Save fine-tuned weights
385
- save_file(model.state_dict(), 'c2_sentinel_finetuned.safetensors')
386
- ```
387
-
388
- ### Training Tips
389
-
390
- 1. **Feature Normalization**: Always normalize input features. Save the mean/std for inference:
391
- ```python
392
- np.savez('normalization_params.npz', mean=mean, std=std)
393
- ```
394
-
395
- 2. **Learning Rate**: Use 0.0001 for training from scratch, 0.00005 for fine-tuning
396
-
397
- 3. **Gradient Clipping**: Prevent exploding gradients:
398
- ```python
399
- torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
400
- ```
401
-
402
- 4. **Early Stopping**: Monitor validation accuracy and stop when it plateaus
403
-
404
- 5. **Balanced Data**: Use roughly equal C2 and benign samples
405
-
406
- ### Model Output Files
407
-
408
- After training, you'll have:
409
- - `c2_sentinel.safetensors` - Model weights
410
- - `normalization_params.npz` - Feature normalization parameters
411
- - `c2_sentinel.json` - Model configuration
412
-
413
- ---
414
-
415
- ## Files
416
-
417
- ```
418
- c2sentinel/
419
- c2sentinel.py # Main module
420
- c2_sentinel.safetensors # Model weights
421
- c2_sentinel.json # Model configuration
422
- README.md # Documentation
423
- API_REFERENCE.md # API reference
424
- examples/
425
- basic_usage.py
426
- advanced_usage.py
427
- ```
428
-
429
- ---
430
-
431
- ## License
432
-
433
- MIT License
434
-
435
- Copyright (c) 2026 Daniel Ostrow
436
 
437
- Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
438
 
439
- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
440
 
441
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
1
  ---
2
+ title: C2Sentinel
3
+ emoji: 🛡️
4
+ colorFrom: red
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: false
8
  license: mit
9
+ models:
10
+ - danielostrow/c2sentinel
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ # C2Sentinel Demo
14
 
15
+ Interactive demo for C2Sentinel - a machine learning model for detecting Command and Control (C2) beacon communications in network traffic.
 
 
16
 
17
+ ## Features
18
 
19
+ - Analyze network connection patterns for C2 activity
20
+ - Preset examples for common scenarios (C2 beacons, legitimate traffic)
21
+ - Adjustable detection threshold
22
+ - Detailed risk factor analysis
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Usage
25
 
26
+ 1. Paste connection data as JSON or select a preset example
27
+ 2. Adjust the detection threshold if needed
28
+ 3. Click "Analyze" to run the model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ ## Model
31
 
32
+ See the [C2Sentinel model repository](https://huggingface.co/danielostrow/c2sentinel) for full documentation.
33
 
34
+ **Author:** Daniel Ostrow | [neuralintellect.com](https://neuralintellect.com)