---
language:
  - en
license: mit
tags:
  - cybersecurity
  - llm
  - from-scratch
  - pytorch
pipeline_tag: text-generation
---

# CyberLLM-350M

A 350M parameter cybersecurity language model built entirely from scratch.

## Model Details

- **Architecture**: LLaMA-3 style decoder-only transformer
- **Parameters**: 303.4M
- **Training Data**: 5B tokens (3.2B security + general)
- **Final Loss**: 3.80 (pretrain) → 1.28 (SFT)
- **Vocab**: 32,000 tokens (custom SentencePiece)
- **Context**: 2,048 tokens

## Training

Pretrained from random initialization on cybersecurity-weighted data including
Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR,
MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.

Fine-tuned with 3,750 cybersecurity instruction-response pairs.

## Usage

```python
# Download and chat
git clone https://github.com/Omkarth/CyberLLM.git
cd CyberLLM

pip install huggingface_hub torch sentencepiece pyyaml
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
"

python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"
```

## Limitations

350M parameters is small — handles common security topics but struggles with
niche technical details. Not a production security tool.

## Author

Omkar Thombre — Master of Computer Science, University of Adelaide