danielostrow commited on
Commit
9638fcb
·
verified ·
1 Parent(s): 0c4b9f9

Remove extra documentation

Browse files
Files changed (1) hide show
  1. API_REFERENCE.md +0 -749
API_REFERENCE.md DELETED
@@ -1,749 +0,0 @@
1
- # C2Sentinel API Reference
2
-
3
- Complete technical documentation for the C2Sentinel Python API.
4
-
5
- **Author:** Daniel Ostrow
6
- **Website:** [neuralintellect.com](https://neuralintellect.com)
7
-
8
- ---
9
-
10
- ## Table of Contents
11
-
12
- 1. [C2Sentinel Class](#c2sentinel-class)
13
- 2. [AnalysisResult Class](#analysisresult-class)
14
- 3. [ConnectionContext Class](#connectioncontext-class)
15
- 4. [ReconSupport Class](#reconsupport-class)
16
- 5. [FeatureExtractor Class](#featureextractor-class)
17
- 6. [LogParser Class](#logparser-class)
18
- 7. [Enums and Constants](#enums-and-constants)
19
-
20
- ---
21
-
22
- ## C2Sentinel Class
23
-
24
- Main interface for C2 detection.
25
-
26
- ### Constructor
27
-
28
- ```python
29
- C2Sentinel(model: LogBERTC2Sentinel, config: C2SentinelConfig, device: str = 'auto')
30
- ```
31
-
32
- | Parameter | Type | Description |
33
- |-----------|------|-------------|
34
- | `model` | LogBERTC2Sentinel | The neural network model |
35
- | `config` | C2SentinelConfig | Model configuration |
36
- | `device` | str | Device for inference ('auto', 'cpu', 'cuda') |
37
-
38
- ### Class Methods
39
-
40
- #### load
41
-
42
- ```python
43
- @classmethod
44
- def load(cls, path: str, device: str = 'auto') -> 'C2Sentinel'
45
- ```
46
-
47
- Load a pre-trained model from safetensors format.
48
-
49
- | Parameter | Type | Description |
50
- |-----------|------|-------------|
51
- | `path` | str | Path to model files (without extension) |
52
- | `device` | str | Device for inference |
53
-
54
- **Returns:** C2Sentinel instance
55
-
56
- **Example:**
57
- ```python
58
- sentinel = C2Sentinel.load('c2_sentinel')
59
- sentinel = C2Sentinel.load('/path/to/c2_sentinel', device='cuda')
60
- ```
61
-
62
- #### create_new
63
-
64
- ```python
65
- @classmethod
66
- def create_new(cls, device: str = 'auto') -> 'C2Sentinel'
67
- ```
68
-
69
- Create a new untrained model instance.
70
-
71
- **Returns:** C2Sentinel instance with random weights
72
-
73
- ---
74
-
75
- ### Instance Methods
76
-
77
- #### analyze
78
-
79
- ```python
80
- def analyze(
81
- self,
82
- connections: List[Dict],
83
- threshold: float = 0.5,
84
- context: Optional[ConnectionContext] = None,
85
- include_features: bool = False,
86
- strict_mode: bool = False
87
- ) -> AnalysisResult
88
- ```
89
-
90
- Analyze a list of connections for C2 activity.
91
-
92
- | Parameter | Type | Default | Description |
93
- |-----------|------|---------|-------------|
94
- | `connections` | List[Dict] | required | List of connection records |
95
- | `threshold` | float | 0.5 | Detection threshold (0.0-1.0) |
96
- | `context` | ConnectionContext | None | Optional context for enrichment |
97
- | `include_features` | bool | False | Include raw feature vector in result |
98
- | `strict_mode` | bool | False | Enforce minimum 0.7 threshold |
99
-
100
- **Returns:** AnalysisResult object
101
-
102
- **Connection Record Fields:**
103
- ```python
104
- {
105
- 'timestamp': float, # Required: Unix timestamp
106
- 'dst_ip': str, # Required: Destination IP
107
- 'dst_port': int, # Required: Destination port
108
- 'bytes_sent': int, # Required: Bytes sent
109
- 'bytes_recv': int, # Required: Bytes received
110
- 'src_ip': str, # Optional: Source IP
111
- 'src_port': int, # Optional: Source port
112
- 'protocol': str, # Optional: 'tcp' or 'udp'
113
- 'duration': float # Optional: Duration in seconds
114
- }
115
- ```
116
-
117
- **Example:**
118
- ```python
119
- connections = [
120
- {'timestamp': 1000, 'dst_ip': '10.0.0.1', 'dst_port': 443,
121
- 'bytes_sent': 200, 'bytes_recv': 500},
122
- {'timestamp': 1060, 'dst_ip': '10.0.0.1', 'dst_port': 443,
123
- 'bytes_sent': 200, 'bytes_recv': 500},
124
- ]
125
-
126
- result = sentinel.analyze(connections)
127
- result = sentinel.analyze(connections, threshold=0.7, strict_mode=True)
128
- ```
129
-
130
- ---
131
-
132
- #### analyze_batch
133
-
134
- ```python
135
- def analyze_batch(
136
- self,
137
- connection_groups: List[List[Dict]],
138
- threshold: float = 0.5,
139
- contexts: Optional[List[ConnectionContext]] = None,
140
- parallel: bool = True
141
- ) -> List[AnalysisResult]
142
- ```
143
-
144
- Analyze multiple connection groups.
145
-
146
- | Parameter | Type | Default | Description |
147
- |-----------|------|---------|-------------|
148
- | `connection_groups` | List[List[Dict]] | required | List of connection lists |
149
- | `threshold` | float | 0.5 | Detection threshold |
150
- | `contexts` | List[ConnectionContext] | None | Context for each group |
151
- | `parallel` | bool | True | Enable parallel processing |
152
-
153
- **Returns:** List of AnalysisResult objects
154
-
155
- **Example:**
156
- ```python
157
- groups = [
158
- [conn1, conn2, conn3],
159
- [conn4, conn5, conn6],
160
- ]
161
- results = sentinel.analyze_batch(groups)
162
- ```
163
-
164
- ---
165
-
166
- #### analyze_logs
167
-
168
- ```python
169
- def analyze_logs(
170
- self,
171
- log_lines: List[str],
172
- group_by_dst: bool = True,
173
- threshold: float = 0.5
174
- ) -> List[Dict]
175
- ```
176
-
177
- Parse and analyze raw log lines.
178
-
179
- | Parameter | Type | Default | Description |
180
- |-----------|------|---------|-------------|
181
- | `log_lines` | List[str] | required | Raw log lines |
182
- | `group_by_dst` | bool | True | Group connections by destination IP |
183
- | `threshold` | float | 0.5 | Detection threshold |
184
-
185
- **Returns:** List of result dictionaries, sorted by probability (descending)
186
-
187
- **Supported Formats:**
188
- - JSON logs with standard fields
189
- - Zeek/Bro conn.log (tab-separated)
190
- - Syslog with IP:port patterns
191
-
192
- **Example:**
193
- ```python
194
- with open('conn.log') as f:
195
- lines = f.readlines()
196
-
197
- results = sentinel.analyze_logs(lines, group_by_dst=True)
198
- for r in results:
199
- print(f"{r['dst_ip']}: {r['c2_probability']}")
200
- ```
201
-
202
- ---
203
-
204
- #### add_whitelist
205
-
206
- ```python
207
- def add_whitelist(
208
- self,
209
- ips: List[str] = None,
210
- domains: List[str] = None
211
- )
212
- ```
213
-
214
- Add IPs or domains to the whitelist. Whitelisted destinations receive reduced C2 probability.
215
-
216
- | Parameter | Type | Description |
217
- |-----------|------|-------------|
218
- | `ips` | List[str] | IP addresses to whitelist |
219
- | `domains` | List[str] | Domain names to whitelist |
220
-
221
- **Example:**
222
- ```python
223
- sentinel.add_whitelist(
224
- ips=['8.8.8.8', '1.1.1.1'],
225
- domains=['google.com', 'github.com']
226
- )
227
- ```
228
-
229
- ---
230
-
231
- #### add_blacklist
232
-
233
- ```python
234
- def add_blacklist(
235
- self,
236
- ips: List[str] = None,
237
- domains: List[str] = None
238
- )
239
- ```
240
-
241
- Add IPs or domains to the blacklist. Blacklisted destinations receive increased C2 probability.
242
-
243
- | Parameter | Type | Description |
244
- |-----------|------|-------------|
245
- | `ips` | List[str] | IP addresses to blacklist |
246
- | `domains` | List[str] | Domain names to blacklist |
247
-
248
- ---
249
-
250
- #### save
251
-
252
- ```python
253
- def save(self, path: str)
254
- ```
255
-
256
- Save model to safetensors format.
257
-
258
- | Parameter | Type | Description |
259
- |-----------|------|-------------|
260
- | `path` | str | Output path (without extension) |
261
-
262
- Creates two files:
263
- - `{path}.safetensors` - Model weights
264
- - `{path}.json` - Configuration
265
-
266
- ---
267
-
268
- ### Instance Attributes
269
-
270
- | Attribute | Type | Description |
271
- |-----------|------|-------------|
272
- | `model` | LogBERTC2Sentinel | The neural network |
273
- | `config` | C2SentinelConfig | Model configuration |
274
- | `device` | torch.device | Inference device |
275
- | `feature_extractor` | FeatureExtractor | Feature extraction module |
276
- | `log_parser` | LogParser | Log parsing module |
277
- | `context_engine` | ContextInference | Context inference module |
278
- | `recon` | ReconSupport | Reconnaissance module |
279
-
280
- ---
281
-
282
- ## AnalysisResult Class
283
-
284
- Dataclass containing analysis results.
285
-
286
- ### Attributes
287
-
288
- | Attribute | Type | Description |
289
- |-----------|------|-------------|
290
- | `is_c2` | bool | True if C2 detected |
291
- | `c2_probability` | float | Probability score (0.0-1.0) |
292
- | `anomaly_score` | float | Anomaly detection score |
293
- | `evasion_score` | float | Evasion technique detection score |
294
- | `confidence` | float | Model confidence in prediction |
295
- | `c2_type` | str | Detected C2 framework type |
296
- | `c2_type_confidence` | float | Confidence in C2 type classification |
297
- | `detection_method` | str | Detection method used |
298
- | `immediate_detection` | bool | True if signature-based detection |
299
- | `context_applied` | bool | True if context was applied |
300
- | `original_probability` | float | Probability before context adjustment |
301
- | `probability_modifier` | float | Context probability modifier |
302
- | `matched_legitimate_pattern` | str | Name of matched legitimate pattern |
303
- | `legitimate_confidence` | float | Confidence in legitimate pattern match |
304
- | `risk_factors` | List[str] | Factors supporting C2 classification |
305
- | `mitigating_factors` | List[str] | Factors against C2 classification |
306
- | `service_type` | str | Detected service type |
307
- | `recommendations` | List[str] | Suggested follow-up actions |
308
- | `features` | List[float] | Raw 40-dimensional feature vector |
309
-
310
- ### Methods
311
-
312
- #### to_dict
313
-
314
- ```python
315
- def to_dict(self) -> Dict[str, Any]
316
- ```
317
-
318
- Convert result to dictionary.
319
-
320
- **Returns:** Dictionary representation of all attributes
321
-
322
- ---
323
-
324
- ## ConnectionContext Class
325
-
326
- Dataclass for providing additional context to improve detection accuracy.
327
-
328
- ### Constructor
329
-
330
- ```python
331
- ConnectionContext(
332
- # Process information
333
- process_name: Optional[str] = None,
334
- process_path: Optional[str] = None,
335
- process_pid: Optional[int] = None,
336
- parent_process: Optional[str] = None,
337
- command_line: Optional[str] = None,
338
-
339
- # Network metadata
340
- dns_queries: Optional[List[str]] = None,
341
- resolved_hostname: Optional[str] = None,
342
- tls_sni: Optional[str] = None,
343
- tls_ja3: Optional[str] = None,
344
- tls_ja3s: Optional[str] = None,
345
- certificate_issuer: Optional[str] = None,
346
- certificate_subject: Optional[str] = None,
347
- certificate_valid: Optional[bool] = None,
348
- http_user_agent: Optional[str] = None,
349
- http_host: Optional[str] = None,
350
-
351
- # Reputation
352
- ip_reputation: Optional[float] = None,
353
- domain_reputation: Optional[float] = None,
354
- known_good: Optional[bool] = None,
355
- known_bad: Optional[bool] = None,
356
- threat_intel_match: Optional[str] = None,
357
-
358
- # Host context
359
- source_hostname: Optional[str] = None,
360
- source_user: Optional[str] = None,
361
- source_is_server: Optional[bool] = None,
362
- source_is_workstation: Optional[bool] = None,
363
-
364
- # Additional
365
- geo_country: Optional[str] = None,
366
- geo_asn: Optional[str] = None,
367
- tags: Optional[List[str]] = None
368
- )
369
- ```
370
-
371
- ### Attribute Details
372
-
373
- | Attribute | Type | Effect on Analysis |
374
- |-----------|------|-------------------|
375
- | `process_name` | str | Known processes reduce probability |
376
- | `known_good` | bool | True reduces probability by 90% |
377
- | `known_bad` | bool | True increases probability by 5x |
378
- | `ip_reputation` | float | Score > 0.8 reduces probability |
379
- | `threat_intel_match` | str | Match increases probability by 5x |
380
- | `tls_ja3` | str | Known C2 JA3 increases probability |
381
- | `certificate_valid` | bool | False increases probability |
382
-
383
- ### Methods
384
-
385
- #### to_dict
386
-
387
- ```python
388
- def to_dict(self) -> Dict[str, Any]
389
- ```
390
-
391
- Convert to dictionary, excluding None values.
392
-
393
- ---
394
-
395
- ## ReconSupport Class
396
-
397
- Reconnaissance and enrichment utilities.
398
-
399
- ### Class Methods
400
-
401
- #### analyze_ip
402
-
403
- ```python
404
- @classmethod
405
- def analyze_ip(cls, ip: str) -> Dict[str, Any]
406
- ```
407
-
408
- Analyze an IP address.
409
-
410
- | Parameter | Type | Description |
411
- |-----------|------|-------------|
412
- | `ip` | str | IP address to analyze |
413
-
414
- **Returns:**
415
- ```python
416
- {
417
- 'ip': str, # Original IP
418
- 'is_valid': bool, # Valid IP format
419
- 'is_private': bool, # RFC 1918 private range
420
- 'is_loopback': bool, # Loopback address
421
- 'is_multicast': bool, # Multicast address
422
- 'is_cdn': bool, # Known CDN range
423
- 'cdn_provider': str, # CDN name if applicable
424
- 'ip_version': int, # 4 or 6
425
- 'reverse_dns': str, # Reverse DNS lookup result
426
- 'numeric': int # Numeric representation
427
- }
428
- ```
429
-
430
- **Known CDN Ranges:**
431
- - Cloudflare
432
- - AWS
433
- - Google Cloud
434
- - Azure
435
- - Akamai
436
-
437
- ---
438
-
439
- #### analyze_connection_patterns
440
-
441
- ```python
442
- @classmethod
443
- def analyze_connection_patterns(cls, connections: List[Dict]) -> Dict[str, Any]
444
- ```
445
-
446
- Analyze connection patterns for threat hunting.
447
-
448
- | Parameter | Type | Description |
449
- |-----------|------|-------------|
450
- | `connections` | List[Dict] | Connection records |
451
-
452
- **Returns:**
453
- ```python
454
- {
455
- 'connection_count': int,
456
- 'unique_destinations': int,
457
- 'unique_ports': int,
458
-
459
- 'timing': {
460
- 'duration_seconds': float,
461
- 'mean_interval': float,
462
- 'interval_stddev': float,
463
- 'interval_cv': float # Coefficient of variation
464
- },
465
-
466
- 'volume': {
467
- 'total_sent': int,
468
- 'total_recv': int,
469
- 'mean_sent': float,
470
- 'mean_recv': float,
471
- 'sent_recv_ratio': float
472
- },
473
-
474
- 'ports': {
475
- port_number: count, # Port distribution
476
- ...
477
- },
478
-
479
- 'destinations': {
480
- ip: analyze_ip_result, # Per-IP analysis
481
- ...
482
- },
483
-
484
- 'indicators': {
485
- 'single_destination': bool,
486
- 'consistent_timing': bool,
487
- 'consistent_sizes': bool,
488
- 'uses_common_port': bool,
489
- 'uses_high_port': bool,
490
- 'has_cdn_destination': bool,
491
- 'all_private_destinations': bool
492
- }
493
- }
494
- ```
495
-
496
- ---
497
-
498
- #### generate_iocs
499
-
500
- ```python
501
- @classmethod
502
- def generate_iocs(
503
- cls,
504
- connections: List[Dict],
505
- result: Dict
506
- ) -> Dict[str, List[str]]
507
- ```
508
-
509
- Generate Indicators of Compromise from detected C2.
510
-
511
- | Parameter | Type | Description |
512
- |-----------|------|-------------|
513
- | `connections` | List[Dict] | Connection records |
514
- | `result` | Dict | Analysis result dictionary |
515
-
516
- **Returns:**
517
- ```python
518
- {
519
- 'ips': List[str], # Destination IPs
520
- 'ports': List[str], # Destination ports
521
- 'timing_signatures': List[str], # Beacon timing patterns
522
- 'behavioral_indicators': List[str] # Behavioral markers
523
- }
524
- ```
525
-
526
- Only generates IOCs if `result['is_c2']` is True.
527
-
528
- ---
529
-
530
- ## FeatureExtractor Class
531
-
532
- Extracts 40-dimensional feature vectors from connections.
533
-
534
- ### Constants
535
-
536
- #### C2_TYPES
537
-
538
- List of detectable C2 framework types:
539
- ```python
540
- [
541
- 'unknown', 'metasploit', 'cobalt_strike', 'sliver', 'havoc',
542
- 'mythic', 'poshc2', 'merlin', 'empire', 'covenant',
543
- 'brute_ratel', 'koadic', 'pupy', 'silenttrinity', 'faction',
544
- 'ibombshell', 'godoh', 'dnscat2', 'iodine', 'dns_generic',
545
- 'http_custom', 'https_custom', 'websocket', 'domain_fronting',
546
- 'cloud_fronting', 'cdn_abuse', 'apt_generic', 'apt28', 'apt29',
547
- 'apt41', 'lazarus', 'fin7', 'turla', 'winnti', 'custom'
548
- ]
549
- ```
550
-
551
- ### Methods
552
-
553
- #### extract_features
554
-
555
- ```python
556
- def extract_features(self, connections: List[Dict]) -> np.ndarray
557
- ```
558
-
559
- Extract 40-dimensional feature vector.
560
-
561
- **Returns:** numpy array of shape (40,)
562
-
563
- **Feature Groups:**
564
- - Features 0-9: Timing (intervals, jitter, regularity, periodicity)
565
- - Features 10-17: Destinations (diversity, persistence, ports)
566
- - Features 18-27: Payload (sizes, ratios, consistency)
567
- - Features 28-35: Evasion (jitter patterns, bursts, session length)
568
- - Features 36-39: Advanced (night activity, fast beacon ratio, duration)
569
-
570
- ---
571
-
572
- #### check_metasploit_signature
573
-
574
- ```python
575
- def check_metasploit_signature(
576
- self,
577
- connections: List[Dict]
578
- ) -> Tuple[bool, float]
579
- ```
580
-
581
- Check for Metasploit-specific signature patterns.
582
-
583
- **Returns:** (is_metasploit, confidence)
584
-
585
- ---
586
-
587
- #### check_ssh_keepalive
588
-
589
- ```python
590
- def check_ssh_keepalive(
591
- self,
592
- connections: List[Dict]
593
- ) -> Tuple[bool, float]
594
- ```
595
-
596
- Check for SSH keepalive pattern.
597
-
598
- **Criteria:**
599
- - Port 22
600
- - Small packets (< 100 bytes)
601
- - Symmetric traffic (sent/recv ratio 0.5-2.0)
602
- - Consistent sizes (CV < 0.2)
603
- - Regular intervals matching common keepalive values
604
-
605
- **Returns:** (is_ssh_keepalive, confidence)
606
-
607
- ---
608
-
609
- ## LogParser Class
610
-
611
- Parses various log formats into connection records.
612
-
613
- ### Static Methods
614
-
615
- #### parse_json
616
-
617
- ```python
618
- @staticmethod
619
- def parse_json(log_line: str) -> Optional[Dict]
620
- ```
621
-
622
- Parse JSON formatted log line.
623
-
624
- **Recognized Fields:**
625
- - timestamp, @timestamp
626
- - src_ip, source_ip, src
627
- - dst_ip, dest_ip, dst
628
- - src_port, source_port
629
- - dst_port, dest_port
630
- - bytes_sent, bytes_out
631
- - bytes_recv, bytes_in
632
-
633
- ---
634
-
635
- #### parse_zeek_conn
636
-
637
- ```python
638
- @staticmethod
639
- def parse_zeek_conn(log_line: str) -> Optional[Dict]
640
- ```
641
-
642
- Parse Zeek/Bro conn.log format (tab-separated).
643
-
644
- ---
645
-
646
- #### parse_syslog
647
-
648
- ```python
649
- @staticmethod
650
- def parse_syslog(log_line: str) -> Optional[Dict]
651
- ```
652
-
653
- Parse common syslog/netflow patterns.
654
-
655
- **Recognized Patterns:**
656
- - `YYYY-MM-DD HH:MM:SS ... IP:port -> IP:port`
657
- - `src=IP ... dst=IP ... sport=port ... dport=port`
658
-
659
- ---
660
-
661
- ## Enums and Constants
662
-
663
- ### DetectionMethod
664
-
665
- ```python
666
- class DetectionMethod(Enum):
667
- SIGNATURE = "signature" # Port + behavior signature match
668
- BEHAVIORAL = "behavioral" # Pure behavioral analysis
669
- ML = "ml" # Machine learning inference
670
- CONTEXT = "context" # Context-adjusted detection
671
- HEURISTIC = "heuristic" # Rule-based detection
672
- WHITELIST = "whitelist" # Matched whitelist pattern
673
- ```
674
-
675
- ### ServiceType
676
-
677
- ```python
678
- class ServiceType(Enum):
679
- SSH = "ssh"
680
- HTTP = "http"
681
- HTTPS = "https"
682
- DNS = "dns"
683
- DATABASE = "database"
684
- API = "api"
685
- STREAMING = "streaming"
686
- GAMING = "gaming"
687
- VPN = "vpn"
688
- MONITORING = "monitoring"
689
- UNKNOWN = "unknown"
690
- ```
691
-
692
- ### C2_INDICATOR_PORTS
693
-
694
- High-confidence C2 signature ports:
695
- ```python
696
- {4444, 4445, 5555, 31337, 40056}
697
- ```
698
-
699
- ### C2_COMMON_PORTS
700
-
701
- Ports commonly used by C2 (require behavioral analysis):
702
- ```python
703
- {80, 443, 53, 8080, 8443, 8888}
704
- ```
705
-
706
- ---
707
-
708
- ## Convenience Functions
709
-
710
- ### load_model
711
-
712
- ```python
713
- def load_model(path: str, device: str = 'auto') -> C2Sentinel
714
- ```
715
-
716
- Shorthand for `C2Sentinel.load()`.
717
-
718
- ### create_model
719
-
720
- ```python
721
- def create_model(device: str = 'auto') -> C2Sentinel
722
- ```
723
-
724
- Shorthand for `C2Sentinel.create_new()`.
725
-
726
- ### quick_analyze
727
-
728
- ```python
729
- def quick_analyze(
730
- connections: List[Dict],
731
- model_path: str = 'c2_sentinel'
732
- ) -> AnalysisResult
733
- ```
734
-
735
- One-shot analysis without keeping model in memory.
736
-
737
- ---
738
-
739
- ## Error Handling
740
-
741
- The API uses standard Python exceptions:
742
-
743
- | Exception | Cause |
744
- |-----------|-------|
745
- | `FileNotFoundError` | Model files not found |
746
- | `ValueError` | Invalid connection format |
747
- | `RuntimeError` | CUDA/device errors |
748
-
749
- All methods handle empty or malformed input gracefully, returning neutral results rather than raising exceptions.