CyberLLM-350M

A 350M parameter cybersecurity language model built entirely from scratch.

Model Details

  • Architecture: LLaMA-3 style decoder-only transformer
  • Parameters: 303.4M
  • Training Data: 5B tokens (3.2B security + general)
  • Final Loss: 3.80 (pretrain) โ†’ 1.28 (SFT)
  • Vocab: 32,000 tokens (custom SentencePiece)
  • Context: 2,048 tokens

Training

Pretrained from random initialization on cybersecurity-weighted data including Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR, MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.

Fine-tuned with 3,750 cybersecurity instruction-response pairs.

Usage

# Download and chat
git clone https://github.com/Omkarth/CyberLLM.git
cd CyberLLM

pip install huggingface_hub torch sentencepiece pyyaml
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
"

python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"

Limitations

350M parameters is small โ€” handles common security topics but struggles with niche technical details. Not a production security tool.

Author

Omkar Thombre โ€” Master of Computer Science, University of Adelaide

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support