sathishphdai's picture
Upload folder using huggingface_hub
5dba742 verified
---
language: [en]
license: mit
tags:
- devops
- cicd
- docker
- kubernetes
- infrastructure
- slm
- llama-style
- rope
- 5m-context
- from-scratch
- 1b-params
pipeline_tag: text-generation
---
# DevOps Engineer-SLM: Role-Based Small Language Model
A **LLaMA-style transformer** (~989.8M params, ~0.99B) trained from scratch for the **DevOps Engineer** role.
Supports up to **5M token context** via RoPE with gradient checkpointing.
## Architecture
| Component | Value |
|-----------|-------|
| Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) |
| Parameters | ~989.8M (~0.99B) |
| Layers | 32 |
| Heads | 20 |
| Embedding | 1600 |
| Max Context | 5,000,000 tokens |
| Max Output | 5,000,000 tokens |
| Vocab | 2,107 BPE |
| Model Size | ~4 GB (fp32) |
## Training
- Best eval loss: 2.5998684406280517
- Trained with gradient checkpointing on Apple M4 (MPS)
- 3 epochs, batch_size=1, grad_accum=16
## Usage
```python
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
model_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "model.safetensors")
tokenizer_path = hf_hub_download("sathishphdai/devops-engineer-slm-5m", "devops_engineer_tokenizer.json")
tokenizer = Tokenizer.from_file(tokenizer_path)
```