RessAI
/

Onner-300m

Text Generation

Model card Files Files and versions

Onner-300m / README.md

RessAI's picture

Update README.md

5b5d13f verified 2 months ago

|

history blame contribute delete

1.71 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	language:
	- en
	library_name: transformers
	tags:
	- pytorch
	- causal-lm
	- text-generation
	- onner
	---
	# 🚀 RessAI Onner-300m

	Onner-300m (internally `RessAI-Ultra-300M`) is a compact, high-efficiency language model designed for educational reasoning and lightweight deployment. With approximately 200 Million parameters, it follows a "Dense & Deep" philosophy scaled down for speed and accessibility.

	It is trained on the high-quality [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset, utilizing a custom architecture (`RessAiForCausalLM`) optimized for efficient inference.

	<div align="center">
	<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="200"/>
	</div>

	## 🔍 Model Details

	- Model Name: RessAI Onner-300m
	- Organization: RessAI
	- Architecture: `RessAiForCausalLM`
	- Model Type: `onner`
	- Parameters: ~199.9 Million (0.20B)
	- Context Window: 4,096 tokens
	- Vocabulary: 128,256
	- Training Precision: Bfloat16
	- License: Apache 2.0

	## 🧠 Technical Specifications

	This model uses a custom configuration inspired by BERT-base sizing but with Llama's causal attention mechanisms:

	\| Hyperparameter \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Hidden Size \| 768 \| Embedding dimension (Compact) \|
	\| Layers \| 12 \| Network depth \|
	\| Attention Heads \| 12 \| Query heads \|
	\| KV Heads \| 2 \| Grouped Query Attention (GQA 6:1) \|
	\| Intermediate Size \| 3,072 \| MLP Width \|
	\| RoPE Theta \| 500,000 \| Rotary Embeddings Base \|
	\| Max Sequence \| 4,096 \| Context Length \|