CyberLLM-350M / README.md
Omk07's picture
Upload README.md with huggingface_hub
45698b3 verified
---
language:
- en
license: mit
tags:
- cybersecurity
- llm
- from-scratch
- pytorch
pipeline_tag: text-generation
---
# CyberLLM-350M
A 350M parameter cybersecurity language model built entirely from scratch.
## Model Details
- **Architecture**: LLaMA-3 style decoder-only transformer
- **Parameters**: 303.4M
- **Training Data**: 5B tokens (3.2B security + general)
- **Final Loss**: 3.80 (pretrain) → 1.28 (SFT)
- **Vocab**: 32,000 tokens (custom SentencePiece)
- **Context**: 2,048 tokens
## Training
Pretrained from random initialization on cybersecurity-weighted data including
Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR,
MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.
Fine-tuned with 3,750 cybersecurity instruction-response pairs.
## Usage
```python
# Download and chat
git clone https://github.com/Omkarth/CyberLLM.git
cd CyberLLM
pip install huggingface_hub torch sentencepiece pyyaml
python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
"
python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"
```
## Limitations
350M parameters is small — handles common security topics but struggles with
niche technical details. Not a production security tool.
## Author
Omkar Thombre — Master of Computer Science, University of Adelaide