Retail-SLM: Retail Small Language Model
A LLaMA-style transformer (~33.9M params) trained from scratch on Retail domain data. Supports up to 1M token context via RoPE.
Architecture
| Component | Value |
|---|---|
| Architecture | LLaMA-style (RoPE + RMSNorm + SwiGLU) |
| Parameters | ~33.9M |
| Layers | 8 |
| Heads | 8 |
| Embedding | 512 |
| Max Context | 100,000,000,000 tokens |
| Vocab | 16,000 BPE |
| Best Loss | 0.8655641369521618 |
License
MIT — Built from scratch.
- Downloads last month
- 29