system-admin-slm-5m / README.md

sathishphdai

Upload folder using huggingface_hub

abedca1 verified 5 days ago

preview code

raw

history blame contribute delete

1.24 kB

metadata

language:
  - en
license: mit
tags:
  - sysadmin
  - linux
  - windows-server
  - networking
  - security
  - slm
  - llama-style
  - rope
  - 5m-context
  - from-scratch
  - 1b-params
pipeline_tag: text-generation

System Admin-SLM: Role-Based Small Language Model

A LLaMA-style transformer (~1016.6M params, ~1.02B) trained from scratch for the System Admin role. Supports up to 5M token context via RoPE with gradient checkpointing.

Architecture

Component	Value
Architecture	LLaMA-style (RoPE + RMSNorm + SwiGLU)
Parameters	~~1016.6M (~~1.02B)
Layers	32
Heads	20
Embedding	1600
Max Context	5,000,000 tokens
Max Output	5,000,000 tokens
Vocab	18,841 BPE
Model Size	~4 GB (fp32)

Training

Best eval loss: 5.795391702651978
Trained with gradient checkpointing on Apple M4 (MPS)
3 epochs, batch_size=1, grad_accum=16

Usage

from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer

model_path = hf_hub_download("sathishphdai/system-admin-slm-5m", "model.safetensors")
tokenizer_path = hf_hub_download("sathishphdai/system-admin-slm-5m", "system_admin_tokenizer.json")
tokenizer = Tokenizer.from_file(tokenizer_path)