RessAI
/

Onner-300m

@@ -25,11 +25,11 @@ It is trained on the high-quality [FineWeb-Edu](https://huggingface.co/datasets/
 - **Model Name:** RessAI Onner-300m
 - **Organization:** RessAI
-- **Architecture:** `RessAiForCausalLM` (Custom Llama-style structure)
 - **Model Type:** `onner`
 - **Parameters:** ~199.9 Million (0.20B)
 - **Context Window:** 4,096 tokens
-- **Vocabulary:** 128,256 (Llama-3 Compatible)
 - **Training Precision:** Bfloat16
 - **License:** Apache 2.0
@@ -45,41 +45,4 @@ This model uses a custom configuration inspired by BERT-base sizing but with Lla
 | **KV Heads** | 2 | Grouped Query Attention (GQA 6:1) |
 | **Intermediate Size** | 3,072 | MLP Width |
 | **RoPE Theta** | 500,000 | Rotary Embeddings Base |
-| **Max Sequence** | 4,096 | Context Length |
-## 💻 Usage
-### Python Code (Transformers)
-Since this model uses a custom architecture configuration (`onner`), ensure you have `transformers` installed.
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
-model_id = "RessAI/Onner-300m"
-# 1. Load Tokenizer
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-# 2. Load Model
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16, # Use float16 if bfloat16 not supported
-    device_map="auto",
-    trust_remote_code=True
-)
-# 3. Inference
-prompt = "The future of artificial intelligence is"
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=50,
-    temperature=0.7,
-    top_p=0.9,
-    do_sample=True
-)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))

 - **Model Name:** RessAI Onner-300m
 - **Organization:** RessAI
+- **Architecture:** `RessAiForCausalLM`
 - **Model Type:** `onner`
 - **Parameters:** ~199.9 Million (0.20B)
 - **Context Window:** 4,096 tokens
+- **Vocabulary:** 128,256
 - **Training Precision:** Bfloat16
 - **License:** Apache 2.0
 | **KV Heads** | 2 | Grouped Query Attention (GQA 6:1) |
 | **Intermediate Size** | 3,072 | MLP Width |
 | **RoPE Theta** | 500,000 | Rotary Embeddings Base |
+| **Max Sequence** | 4,096 | Context Length |