Retail-SLM: Retail Small Language Model

A LLaMA-style transformer (~33.9M params) trained from scratch on Retail domain data. Supports up to 5M token context via RoPE.

Architecture

Component Value
Architecture LLaMA-style (RoPE + RMSNorm + SwiGLU)
Parameters ~33.9M
Layers 8
Heads 8
Embedding 512
Max Context 5,000,000 tokens
Vocab 16,000 BPE
Best Loss 0.8758564081043005

License

MIT — Built from scratch.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support