kristiangnordby
/

natSecLabse

CyberClassifier

binary-classification

Model card Files Files and versions

natSecLabse / README.md

kristiangnordby's picture

kristiangnordby

Upload README.md with huggingface_hub

c7c7611 verified 3 months ago

|

history blame contribute delete

2.22 kB

	---
	language: en
	license: mit
	tags:
	- cybersecurity
	- binary-classification
	- pytorch
	datasets:
	- custom
	metrics:
	- accuracy
	- auc
	- precision
	- recall
	---

	# natSecLabse

	## Model Description

	Binary classification model for cybersecurity threat detection. The model uses a deep neural network to classify text embeddings as cyber-related or non-cyber content.

	## Model Architecture

	- Input: 768-dimensional embeddings (e.g., from Gemma)
	- Hidden Layers: 512 → 256 → 128 neurons
	- Output: 1 (binary classification with sigmoid activation)
	- Normalization: LayerNorm + BatchNorm
	- Activation: ReLU
	- Total Parameters: ~557,184

	## Performance Metrics

	- Accuracy: 0.8835
	- Precision: 0.5713
	- Recall: 0.8645
	- AUC: 0.9482
	- F1 Score: 0.6880

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(
	repo_id="kristiangnordby/natSecLabse",
	filename="model.pt"
	)

	# Load model
	checkpoint = torch.load(model_path, map_location='cpu')

	# For inference, you'll need the model class definition
	# See model_architecture.py in this repo
	```

	## Training Data

	- Training set: ~166K samples
	- Validation set: ~25K samples
	- Test set: ~41K samples
	- Class distribution: ~18% cyber-related, ~82% non-cyber

	## Intended Use

	This model is designed for:
	- Cybersecurity content detection
	- Filtering cyber-related articles/documents
	- Security threat classification

	## Limitations

	- Requires pre-computed embeddings as input
	- Trained on specific corpus - may need fine-tuning for other domains
	- Performance depends on quality of input embeddings

	## Training Details

	- Optimizer: Adam (lr=0.001, β₁=0.9, β₂=0.999)
	- Loss Function: Binary Cross-Entropy
	- Batch Size: 512
	- Early Stopping: Patience of 15 epochs
	- Learning Rate Scheduling: ReduceLROnPlateau (factor=0.5, patience=5)

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{cybersecurity_classifier,
	author = {Kristian Nordby},
	title = {Cybersecurity Binary Classifier},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/kristiangnordby/natSecLabse}}
	}
	```