| | --- |
| | language: [en] |
| | license: mit |
| | tags: |
| | - devops |
| | - cicd |
| | - docker |
| | - kubernetes |
| | - infrastructure |
| | - slm |
| | - llama-style |
| | - rope |
| | - 5m-context |
| | - from-scratch |
| | - 1b-params |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # DevOps Engineer-SLM: Role-Based Small Language Model |
| |
|
| | A **LLaMA-style transformer** (~989.8M params, ~0.99B) trained from scratch for the **DevOps Engineer** role. |
| | Supports up to **5M token context** via RoPE with gradient checkpointing. |
| |
|
| | ## Architecture |
| | | Component | Value | |
| | |-----------|-------| |
| | | Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) | |
| | | Parameters | ~989.8M (~0.99B) | |
| | | Layers | 32 | |
| | | Heads | 20 | |
| | | Embedding | 1600 | |
| | | Max Context | 5,000,000 tokens | |
| | | Max Output | 5,000,000 tokens | |
| | | Vocab | 2,107 BPE | |
| | | Model Size | ~4 GB (fp32) | |
| |
|
| | ## Training |
| | - Best eval loss: 2.5998684406280517 |
| | - Trained with gradient checkpointing on Apple M4 (MPS) |
| | - 3 epochs, batch_size=1, grad_accum=16 |
| |
|
| | ## Usage |
| | ```python |
| | from huggingface_hub import hf_hub_download |
| | from tokenizers import Tokenizer |
| | |
| | model_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "model.safetensors") |
| | tokenizer_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "devops_engineer_tokenizer.json") |
| | tokenizer = Tokenizer.from_file(tokenizer_path) |
| | ``` |
| |
|