| | --- |
| | language: en |
| | license: mit |
| | tags: |
| | - cybersecurity |
| | - binary-classification |
| | - pytorch |
| | datasets: |
| | - custom |
| | metrics: |
| | - accuracy |
| | - auc |
| | - precision |
| | - recall |
| | --- |
| | |
| | # natSecLabse |
| |
|
| | ## Model Description |
| |
|
| | Binary classification model for cybersecurity threat detection. The model uses a deep neural network to classify text embeddings as cyber-related or non-cyber content. |
| |
|
| | ## Model Architecture |
| |
|
| | - **Input**: 768-dimensional embeddings (e.g., from Gemma) |
| | - **Hidden Layers**: 512 → 256 → 128 neurons |
| | - **Output**: 1 (binary classification with sigmoid activation) |
| | - **Normalization**: LayerNorm + BatchNorm |
| | - **Activation**: ReLU |
| | - **Total Parameters**: ~557,184 |
| |
|
| | ## Performance Metrics |
| |
|
| | - **Accuracy**: 0.8835 |
| | - **Precision**: 0.5713 |
| | - **Recall**: 0.8645 |
| | - **AUC**: 0.9482 |
| | - **F1 Score**: 0.6880 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download model |
| | model_path = hf_hub_download( |
| | repo_id="kristiangnordby/natSecLabse", |
| | filename="model.pt" |
| | ) |
| | |
| | # Load model |
| | checkpoint = torch.load(model_path, map_location='cpu') |
| | |
| | # For inference, you'll need the model class definition |
| | # See model_architecture.py in this repo |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | - Training set: ~166K samples |
| | - Validation set: ~25K samples |
| | - Test set: ~41K samples |
| | - Class distribution: ~18% cyber-related, ~82% non-cyber |
| |
|
| | ## Intended Use |
| |
|
| | This model is designed for: |
| | - Cybersecurity content detection |
| | - Filtering cyber-related articles/documents |
| | - Security threat classification |
| |
|
| | ## Limitations |
| |
|
| | - Requires pre-computed embeddings as input |
| | - Trained on specific corpus - may need fine-tuning for other domains |
| | - Performance depends on quality of input embeddings |
| |
|
| | ## Training Details |
| |
|
| | - **Optimizer**: Adam (lr=0.001, β₁=0.9, β₂=0.999) |
| | - **Loss Function**: Binary Cross-Entropy |
| | - **Batch Size**: 512 |
| | - **Early Stopping**: Patience of 15 epochs |
| | - **Learning Rate Scheduling**: ReduceLROnPlateau (factor=0.5, patience=5) |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| |
|
| | ```bibtex |
| | @misc{cybersecurity_classifier, |
| | author = {Kristian Nordby}, |
| | title = {Cybersecurity Binary Classifier}, |
| | year = {2025}, |
| | publisher = {HuggingFace}, |
| | howpublished = {\url{https://huggingface.co/kristiangnordby/natSecLabse}} |
| | } |
| | ``` |
| |
|