natSecLabse / README.md
kristiangnordby's picture
Upload README.md with huggingface_hub
c7c7611 verified
---
language: en
license: mit
tags:
- cybersecurity
- binary-classification
- pytorch
datasets:
- custom
metrics:
- accuracy
- auc
- precision
- recall
---
# natSecLabse
## Model Description
Binary classification model for cybersecurity threat detection. The model uses a deep neural network to classify text embeddings as cyber-related or non-cyber content.
## Model Architecture
- **Input**: 768-dimensional embeddings (e.g., from Gemma)
- **Hidden Layers**: 512 → 256 → 128 neurons
- **Output**: 1 (binary classification with sigmoid activation)
- **Normalization**: LayerNorm + BatchNorm
- **Activation**: ReLU
- **Total Parameters**: ~557,184
## Performance Metrics
- **Accuracy**: 0.8835
- **Precision**: 0.5713
- **Recall**: 0.8645
- **AUC**: 0.9482
- **F1 Score**: 0.6880
## Usage
```python
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="kristiangnordby/natSecLabse",
filename="model.pt"
)
# Load model
checkpoint = torch.load(model_path, map_location='cpu')
# For inference, you'll need the model class definition
# See model_architecture.py in this repo
```
## Training Data
- Training set: ~166K samples
- Validation set: ~25K samples
- Test set: ~41K samples
- Class distribution: ~18% cyber-related, ~82% non-cyber
## Intended Use
This model is designed for:
- Cybersecurity content detection
- Filtering cyber-related articles/documents
- Security threat classification
## Limitations
- Requires pre-computed embeddings as input
- Trained on specific corpus - may need fine-tuning for other domains
- Performance depends on quality of input embeddings
## Training Details
- **Optimizer**: Adam (lr=0.001, β₁=0.9, β₂=0.999)
- **Loss Function**: Binary Cross-Entropy
- **Batch Size**: 512
- **Early Stopping**: Patience of 15 epochs
- **Learning Rate Scheduling**: ReduceLROnPlateau (factor=0.5, patience=5)
## Citation
If you use this model, please cite:
```bibtex
@misc{cybersecurity_classifier,
author = {Kristian Nordby},
title = {Cybersecurity Binary Classifier},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/kristiangnordby/natSecLabse}}
}
```