Omk07
/

CyberLLM-350M

+---
+language:
+  - en
+license: mit
+tags:
+  - cybersecurity
+  - llm
+  - from-scratch
+  - pytorch
+pipeline_tag: text-generation
+---
+# CyberLLM-350M
+A 350M parameter cybersecurity language model built entirely from scratch.
+## Model Details
+- **Architecture**: LLaMA-3 style decoder-only transformer
+- **Parameters**: 303.4M
+- **Training Data**: 5B tokens (3.2B security + general)
+- **Final Loss**: 3.80 (pretrain) → 1.28 (SFT)
+- **Vocab**: 32,000 tokens (custom SentencePiece)
+- **Context**: 2,048 tokens
+## Training
+Pretrained from random initialization on cybersecurity-weighted data including
+Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR,
+MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.
+Fine-tuned with 3,750 cybersecurity instruction-response pairs.
+## Usage
+```python
+# Download and chat
+git clone https://github.com/Omkarth/CyberLLM.git
+cd CyberLLM
+pip install huggingface_hub torch sentencepiece pyyaml
+python -c "
+from huggingface_hub import hf_hub_download
+hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
+hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
+hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
+"
+python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"
+```
+## Limitations
+350M parameters is small — handles common security topics but struggles with
+niche technical details. Not a production security tool.
+## Author
+Omkar Thombre — Master of Computer Science, University of Adelaide