---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
library_name: transformers
tags:
- pytorch
- causal-lm
- text-generation
- onner
---
# 🚀 RessAI Onner-300m

**Onner-300m** (internally `RessAI-Ultra-300M`) is a compact, high-efficiency language model designed for educational reasoning and lightweight deployment. With approximately **200 Million parameters**, it follows a "Dense & Deep" philosophy scaled down for speed and accessibility.

It is trained on the high-quality [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset, utilizing a custom architecture (`RessAiForCausalLM`) optimized for efficient inference.

<div align="center">
  <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="200"/>
</div>

## 🔍 Model Details

- **Model Name:** RessAI Onner-300m
- **Organization:** RessAI
- **Architecture:** `RessAiForCausalLM`
- **Model Type:** `onner`
- **Parameters:** ~199.9 Million (0.20B)
- **Context Window:** 4,096 tokens
- **Vocabulary:** 128,256
- **Training Precision:** Bfloat16
- **License:** Apache 2.0

## 🧠 Technical Specifications

This model uses a custom configuration inspired by BERT-base sizing but with Llama's causal attention mechanisms:

| Hyperparameter | Value | Description |
| :--- | :--- | :--- |
| **Hidden Size** | 768 | Embedding dimension (Compact) |
| **Layers** | 12 | Network depth |
| **Attention Heads** | 12 | Query heads |
| **KV Heads** | 2 | Grouped Query Attention (GQA 6:1) |
| **Intermediate Size** | 3,072 | MLP Width |
| **RoPE Theta** | 500,000 | Rotary Embeddings Base |
| **Max Sequence** | 4,096 | Context Length |