Onner
Collection
1 item
β’
Updated
Onner-300m (internally RessAI-Ultra-300M) is a compact, high-efficiency language model designed for educational reasoning and lightweight deployment. With approximately 200 Million parameters, it follows a "Dense & Deep" philosophy scaled down for speed and accessibility.
It is trained on the high-quality FineWeb-Edu dataset, utilizing a custom architecture (RessAiForCausalLM) optimized for efficient inference.
RessAiForCausalLMonnerThis model uses a custom configuration inspired by BERT-base sizing but with Llama's causal attention mechanisms:
| Hyperparameter | Value | Description |
|---|---|---|
| Hidden Size | 768 | Embedding dimension (Compact) |
| Layers | 12 | Network depth |
| Attention Heads | 12 | Query heads |
| KV Heads | 2 | Grouped Query Attention (GQA 6:1) |
| Intermediate Size | 3,072 | MLP Width |
| RoPE Theta | 500,000 | Rotary Embeddings Base |
| Max Sequence | 4,096 | Context Length |