--- language: - en license: mit tags: - cybersecurity - llm - from-scratch - pytorch pipeline_tag: text-generation --- # CyberLLM-350M A 350M parameter cybersecurity language model built entirely from scratch. ## Model Details - **Architecture**: LLaMA-3 style decoder-only transformer - **Parameters**: 303.4M - **Training Data**: 5B tokens (3.2B security + general) - **Final Loss**: 3.80 (pretrain) → 1.28 (SFT) - **Vocab**: 32,000 tokens (custom SentencePiece) - **Context**: 2,048 tokens ## Training Pretrained from random initialization on cybersecurity-weighted data including Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR, MITRE ATT&CK, NIST SP 800 series, and OWASP documentation. Fine-tuned with 3,750 cybersecurity instruction-response pairs. ## Usage ```python # Download and chat git clone https://github.com/Omkarth/CyberLLM.git cd CyberLLM pip install huggingface_hub torch sentencepiece pyyaml python -c " from huggingface_hub import hf_hub_download hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints') hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints') hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer') " python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?" ``` ## Limitations 350M parameters is small — handles common security topics but struggles with niche technical details. Not a production security tool. ## Author Omkar Thombre — Master of Computer Science, University of Adelaide