Omk07
/

CyberLLM-350M

Text Generation

Model card Files Files and versions

CyberLLM-350M / README.md

Omk07's picture

Upload README.md with huggingface_hub

45698b3 verified 1 day ago

|

history blame contribute delete

1.64 kB

	---
	language:
	- en
	license: mit
	tags:
	- cybersecurity
	- llm
	- from-scratch
	- pytorch
	pipeline_tag: text-generation
	---

	# CyberLLM-350M

	A 350M parameter cybersecurity language model built entirely from scratch.

	## Model Details

	- Architecture: LLaMA-3 style decoder-only transformer
	- Parameters: 303.4M
	- Training Data: 5B tokens (3.2B security + general)
	- Final Loss: 3.80 (pretrain) → 1.28 (SFT)
	- Vocab: 32,000 tokens (custom SentencePiece)
	- Context: 2,048 tokens

	## Training

	Pretrained from random initialization on cybersecurity-weighted data including
	Trend Micro Primus-FineWeb, Stack Exchange security sites, ArXiv cs.CR,
	MITRE ATT&CK, NIST SP 800 series, and OWASP documentation.

	Fine-tuned with 3,750 cybersecurity instruction-response pairs.

	## Usage

	```python
	# Download and chat
	git clone https://github.com/Omkarth/CyberLLM.git
	cd CyberLLM

	pip install huggingface_hub torch sentencepiece pyyaml
	python -c "
	from huggingface_hub import hf_hub_download
	hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='model.pt', local_dir='checkpoints')
	hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='config.yaml', local_dir='checkpoints')
	hf_hub_download(repo_id='Omk07/CyberLLM-350M', filename='cybersec_tokenizer.model', local_dir='tokenizer')
	"

	python training/chat.py --model checkpoints/model.pt --question "What is SQL injection?"
	```

	## Limitations

	350M parameters is small — handles common security topics but struggles with
	niche technical details. Not a production security tool.

	## Author

	Omkar Thombre — Master of Computer Science, University of Adelaide