Operations Manager-SLM: Role-Based Small Language Model
A LLaMA-style transformer (~989.0M params, ~0.99B) trained from scratch for the Operations Manager role. Supports up to 5M token context via RoPE with gradient checkpointing.
Architecture
| Component | Value |
|---|---|
| Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) |
| Parameters | |
| Layers | 32 |
| Heads | 20 |
| Embedding | 1600 |
| Max Context | 5,000,000 tokens |
| Max Output | 5,000,000 tokens |
| Vocab | 1,625 BPE |
| Model Size | ~4 GB (fp32) |
Training
- Best eval loss: 3.082059216499329
- Trained with gradient checkpointing on Apple M4 (MPS)
- 3 epochs, batch_size=1, grad_accum=16
Usage
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
model_path = hf_hub_download("sathishphdai/operations-manager-slm-5m", "model.safetensors")
tokenizer_path = hf_hub_download("sathishphdai/operations-manager-slm-5m", "operations_manager_tokenizer.json")
tokenizer = Tokenizer.from_file(tokenizer_path)
- Downloads last month
- 16