|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- pytorch |
|
|
- causal-lm |
|
|
- text-generation |
|
|
- onner |
|
|
--- |
|
|
# ๐ RessAI Onner-300m |
|
|
|
|
|
**Onner-300m** (internally `RessAI-Ultra-300M`) is a compact, high-efficiency language model designed for educational reasoning and lightweight deployment. With approximately **200 Million parameters**, it follows a "Dense & Deep" philosophy scaled down for speed and accessibility. |
|
|
|
|
|
It is trained on the high-quality [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset, utilizing a custom architecture (`RessAiForCausalLM`) optimized for efficient inference. |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="200"/> |
|
|
</div> |
|
|
|
|
|
## ๐ Model Details |
|
|
|
|
|
- **Model Name:** RessAI Onner-300m |
|
|
- **Organization:** RessAI |
|
|
- **Architecture:** `RessAiForCausalLM` |
|
|
- **Model Type:** `onner` |
|
|
- **Parameters:** ~199.9 Million (0.20B) |
|
|
- **Context Window:** 4,096 tokens |
|
|
- **Vocabulary:** 128,256 |
|
|
- **Training Precision:** Bfloat16 |
|
|
- **License:** Apache 2.0 |
|
|
|
|
|
## ๐ง Technical Specifications |
|
|
|
|
|
This model uses a custom configuration inspired by BERT-base sizing but with Llama's causal attention mechanisms: |
|
|
|
|
|
| Hyperparameter | Value | Description | |
|
|
| :--- | :--- | :--- | |
|
|
| **Hidden Size** | 768 | Embedding dimension (Compact) | |
|
|
| **Layers** | 12 | Network depth | |
|
|
| **Attention Heads** | 12 | Query heads | |
|
|
| **KV Heads** | 2 | Grouped Query Attention (GQA 6:1) | |
|
|
| **Intermediate Size** | 3,072 | MLP Width | |
|
|
| **RoPE Theta** | 500,000 | Rotary Embeddings Base | |
|
|
| **Max Sequence** | 4,096 | Context Length | |