danielostrow
/

c2sentinel

 ---
+## Training Your Own Model
+C2Sentinel supports training custom weights on your own data. This is useful for:
+- Fine-tuning on your network's specific traffic patterns
+- Adding detection for new C2 frameworks
+- Reducing false positives in your environment
+### Prerequisites
+```bash
+pip install torch numpy safetensors tqdm packaging
+```
+### Using Pre-trained Weights
+The released weights are trained on synthetic C2 beacon patterns covering 10+ framework types:
+```python
+from c2sentinel import C2Sentinel
+# Load pre-trained weights from HuggingFace
+sentinel = C2Sentinel.from_pretrained('danielostrow/c2sentinel')
+# Or load from local files
+sentinel = C2Sentinel.load('c2_sentinel')
+```
+### Training From Scratch
+Use the provided training script to train on synthetic data:
+```bash
+# Basic training (20,000 samples, 100 epochs)
+python train_model.py --epochs 100 --samples 20000
+# Faster training with fewer samples
+python train_model.py --epochs 50 --samples 10000
+# Custom learning rate
+python train_model.py --epochs 100 --samples 25000 --lr 0.0001
+```
+### Training on Custom Data
+Create a custom dataset class that returns connection records:
+```python
+from torch.utils.data import Dataset
+from c2sentinel import FeatureExtractor
+class CustomC2Dataset(Dataset):
+    def __init__(self, labeled_connections):
+        self.feature_extractor = FeatureExtractor()
+        self.samples = []
+        self.labels = []
+        for connections, is_c2 in labeled_connections:
+            features = self.feature_extractor.extract_features(connections)
+            self.samples.append(features)
+            self.labels.append(1 if is_c2 else 0)
+        # Normalize features (critical for training stability)
+        self.samples = np.array(self.samples, dtype=np.float32)
+        self.mean = np.mean(self.samples, axis=0)
+        self.std = np.std(self.samples, axis=0) + 1e-8
+        self.samples = (self.samples - self.mean) / self.std
+    def __len__(self):
+        return len(self.samples)
+    def __getitem__(self, idx):
+        return {
+            'features': torch.tensor(self.samples[idx]),
+            'label': torch.tensor(self.labels[idx], dtype=torch.float32)
+        }
+```
+### Fine-tuning Pre-trained Weights
+Start from pre-trained weights and fine-tune on your data:
+```python
+from c2sentinel import LogBERTC2Sentinel, C2SentinelConfig
+from safetensors.torch import load_file, save_file
+import torch.optim as optim
+# Load pre-trained model
+config = C2SentinelConfig()
+model = LogBERTC2Sentinel(config)
+state_dict = load_file('c2_sentinel.safetensors')
+model.load_state_dict(state_dict)
+# Fine-tune with lower learning rate
+optimizer = optim.AdamW(model.parameters(), lr=0.00005, weight_decay=0.01)
+# Train on your data...
+# Save fine-tuned weights
+save_file(model.state_dict(), 'c2_sentinel_finetuned.safetensors')
+```
+### Training Tips
+1. **Feature Normalization**: Always normalize input features. Save the mean/std for inference:
+   ```python
+   np.savez('normalization_params.npz', mean=mean, std=std)
+   ```
+2. **Learning Rate**: Use 0.0001 for training from scratch, 0.00005 for fine-tuning
+3. **Gradient Clipping**: Prevent exploding gradients:
+   ```python
+   torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
+   ```
+4. **Early Stopping**: Monitor validation accuracy and stop when it plateaus
+5. **Balanced Data**: Use roughly equal C2 and benign samples
+### Model Output Files
+After training, you'll have:
+- `c2_sentinel.safetensors` - Model weights
+- `normalization_params.npz` - Feature normalization parameters
+- `c2_sentinel.json` - Model configuration
+---
 ## Files
 ```