--- language: [en] license: mit tags: - devops - cicd - docker - kubernetes - infrastructure - slm - llama-style - rope - 5m-context - from-scratch - 1b-params pipeline_tag: text-generation --- # DevOps Engineer-SLM: Role-Based Small Language Model A **LLaMA-style transformer** (~989.8M params, ~0.99B) trained from scratch for the **DevOps Engineer** role. Supports up to **5M token context** via RoPE with gradient checkpointing. ## Architecture | Component | Value | |-----------|-------| | Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) | | Parameters | ~989.8M (~0.99B) | | Layers | 32 | | Heads | 20 | | Embedding | 1600 | | Max Context | 5,000,000 tokens | | Max Output | 5,000,000 tokens | | Vocab | 2,107 BPE | | Model Size | ~4 GB (fp32) | ## Training - Best eval loss: 2.5998684406280517 - Trained with gradient checkpointing on Apple M4 (MPS) - 3 epochs, batch_size=1, grad_accum=16 ## Usage ```python from huggingface_hub import hf_hub_download from tokenizers import Tokenizer model_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "model.safetensors") tokenizer_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "devops_engineer_tokenizer.json") tokenizer = Tokenizer.from_file(tokenizer_path) ```