README.md · sathishphdai/devops-engineer-slm-1m at main

Upload folder using huggingface_hub

d62dca3 verified 12 days ago

1.26 kB

	---
	language: [en]
	license: mit
	tags:
	- devops
	- cicd
	- docker
	- kubernetes
	- infrastructure
	- slm
	- llama-style
	- rope
	- 1m-context
	- from-scratch
	- 1b-params
	pipeline_tag: text-generation
	---

	# DevOps Engineer-SLM: Role-Based Small Language Model

	A LLaMA-style transformer (~989.8M params, ~0.99B) trained from scratch for the DevOps Engineer role.
	Supports up to 1M token context via RoPE with gradient checkpointing.

	## Architecture
	\| Component \| Value \|
	\|-----------\|-------\|
	\| Architecture \| LLaMA-style (RoPE + RMSNorm + SwiGLU) \|
	\| Parameters \| ~989.8M (~0.99B) \|
	\| Layers \| 32 \|
	\| Heads \| 20 \|
	\| Embedding \| 1600 \|
	\| Max Context \| 100,000,000,000 tokens \|
	\| Max Output \| 1,000,000 tokens \|
	\| Vocab \| 2,107 BPE \|
	\| Model Size \| ~4 GB (fp32) \|

	## Training
	- Best eval loss: 2.5998684406280517
	- Trained with gradient checkpointing on Apple M4 (MPS)
	- 3 epochs, batch_size=1, grad_accum=16

	## Usage
	```python
	from huggingface_hub import hf_hub_download
	from tokenizers import Tokenizer

	model_path = hf_hub_download("sathishphdai/devops-engineer-slm-1m", "model.safetensors")
	tokenizer_path = hf_hub_download("sathishphdai/devops-engineer-slm-1m", "devops_engineer_tokenizer.json")
	tokenizer = Tokenizer.from_file(tokenizer_path)
	```