| --- |
| language: [en] |
| license: mit |
| tags: |
| - devops |
| - cicd |
| - docker |
| - kubernetes |
| - infrastructure |
| - slm |
| - llama-style |
| - rope |
| - 1m-context |
| - from-scratch |
| - 1b-params |
| pipeline_tag: text-generation |
| --- |
| |
| # DevOps Engineer-SLM: Role-Based Small Language Model |
|
|
| A **LLaMA-style transformer** (~989.8M params, ~0.99B) trained from scratch for the **DevOps Engineer** role. |
| Supports up to **1M token context** via RoPE with gradient checkpointing. |
|
|
| ## Architecture |
| | Component | Value | |
| |-----------|-------| |
| | Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) | |
| | Parameters | ~989.8M (~0.99B) | |
| | Layers | 32 | |
| | Heads | 20 | |
| | Embedding | 1600 | |
| | Max Context | 100,000,000,000 tokens | |
| | Max Output | 1,000,000 tokens | |
| | Vocab | 2,107 BPE | |
| | Model Size | ~4 GB (fp32) | |
|
|
| ## Training |
| - Best eval loss: 2.5998684406280517 |
| - Trained with gradient checkpointing on Apple M4 (MPS) |
| - 3 epochs, batch_size=1, grad_accum=16 |
|
|
| ## Usage |
| ```python |
| from huggingface_hub import hf_hub_download |
| from tokenizers import Tokenizer |
| |
| model_path = hf_hub_download("sathishphdai/devops-engineer-slm-1m", "model.safetensors") |
| tokenizer_path = hf_hub_download("sathishphdai/devops-engineer-slm-1m", "devops_engineer_tokenizer.json") |
| tokenizer = Tokenizer.from_file(tokenizer_path) |
| ``` |
|
|