metadata
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
library_name: transformers
tags:
- pytorch
- causal-lm
- text-generation
- onner
π RessAI Onner-300m
Onner-300m (internally RessAI-Ultra-300M) is a compact, high-efficiency language model designed for educational reasoning and lightweight deployment. With approximately 200 Million parameters, it follows a "Dense & Deep" philosophy scaled down for speed and accessibility.
It is trained on the high-quality FineWeb-Edu dataset, utilizing a custom architecture (RessAiForCausalLM) optimized for efficient inference.
π Model Details
- Model Name: RessAI Onner-300m
- Organization: RessAI
- Architecture:
RessAiForCausalLM - Model Type:
onner - Parameters: ~199.9 Million (0.20B)
- Context Window: 4,096 tokens
- Vocabulary: 128,256
- Training Precision: Bfloat16
- License: Apache 2.0
π§ Technical Specifications
This model uses a custom configuration inspired by BERT-base sizing but with Llama's causal attention mechanisms:
| Hyperparameter | Value | Description |
|---|---|---|
| Hidden Size | 768 | Embedding dimension (Compact) |
| Layers | 12 | Network depth |
| Attention Heads | 12 | Query heads |
| KV Heads | 2 | Grouped Query Attention (GQA 6:1) |
| Intermediate Size | 3,072 | MLP Width |
| RoPE Theta | 500,000 | Rotary Embeddings Base |
| Max Sequence | 4,096 | Context Length |