Retail-SLM: Retail Small Language Model

A LLaMA-style transformer (~33.9M params) trained from scratch on Retail domain data. Supports up to 5M token context via RoPE.

Architecture

Component	Value
Architecture	LLaMA-style (RoPE + RMSNorm + SwiGLU)
Parameters	~33.9M
Layers	8
Heads	8
Embedding	512
Max Context	5,000,000 tokens
Vocab	16,000 BPE
Best Loss	0.8758564081043005

MIT — Built from scratch.

Safetensors

Model size

42.1M params

Tensor type

F32