assemsabry commited on
Commit
cbf34cc
·
verified ·
1 Parent(s): d350b3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -106
README.md CHANGED
@@ -1,32 +1,81 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
5
  - ar
 
 
 
 
 
 
 
 
 
 
 
6
  tags:
 
 
7
  - text-generation
 
8
  - multilingual
9
  - causal-lm
10
- - arabic
11
- metrics:
12
- - accuracy
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # Horus-1.0-4B
16
 
17
- ![Horus Model](https://huggingface.co/tokenaii/horus/resolve/main/media/main.png)
18
 
19
- **Developer:** TokenAI
 
 
 
20
  **Release Date:** April 2026
21
- **License:** MIT
 
 
 
22
 
23
  ---
24
 
25
  ## Overview
26
 
27
- Horus-1.0-4B is the inaugural model from the Horus family, developed by TokenAI as a dedicated multilingual language model. Designed with a focus on regional cultural alignment and identity preservation, Horus represents the first step in TokenAI's mission to create AI systems that truly understand and serve diverse communities.
 
 
 
 
 
 
 
 
28
 
29
- The model has been specifically trained to maintain a strong sense of identity, recognizing its Egyptian origins and the team behind its creation. Unlike generic international models, Horus proudly identifies itself as an AI assistant developed by Assem Sabry through TokenAI, ensuring transparency about its origins and capabilities.
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ---
32
 
@@ -38,7 +87,7 @@ The model has been specifically trained to maintain a strong sense of identity,
38
  - **Regional Focus:** Optimized for diverse cultural contexts
39
 
40
  ### Core Competencies
41
- - **Identity Recognition:** Strong self-identification as "Horus" developed by "Assem Sabry" from "TokenAI"
42
  - **Cultural Alignment:** Responses aligned with Arab cultural values and contexts
43
  - **Reasoning:** Chain-of-thought reasoning capabilities with step-by-step problem solving
44
  - **General Knowledge:** Broad knowledge across history, science, geography, and literature
@@ -66,12 +115,68 @@ The following quantized versions are available for different deployment scenario
66
  | Q5_K_M | GGUF | ~2.7 GB | Higher quality than Q4 |
67
  | Q6_K | GGUF | ~3.1 GB | Near-full quality |
68
  | Q8_0 | GGUF | ~4.1 GB | Minimal quality loss |
69
- | F16 | GGUF | ~8.0 GB | Full precision, direct loading |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ---
72
 
73
  ## Benchmark Results
74
 
 
 
 
 
75
  #### General Knowledge & Reasoning Benchmarks
76
  ![General Benchmarks](media/1.png)
77
 
@@ -119,127 +224,60 @@ The following quantized versions are available for different deployment scenario
119
 
120
  ### Complete Performance Summary
121
 
122
- | Benchmark | Score | Status |
123
- |-----------|-------|--------|
124
- | MMLU | 60% | ✅ |
125
- | GPQA_Diamond | 100% | ✅ |
126
- | SWE_bench | 66.67% | ✅ |
127
- | IFEval | 100% | ✅ |
128
- | BFCL | 100% | ✅ |
129
- | OmniDocBench | 100% | ✅ |
130
- | Terminal_Bench | 100% | ✅ |
131
- | ERQA | 66.67% | ✅ |
132
- | BrowseComp | 100% | ✅ |
133
- | Arabic_IEN_MCQ | 100% | ✅ |
134
- | Arabic_ExamsAR | 100% | ✅ |
135
- | Arabic_ACVA | 50% | ✅ |
136
- | English_AGIEval | 66.67% | ✅ |
137
- | English_Arc_Challenge | 100% | ✅ |
138
- | English_GPQA | 100% | ✅ |
139
- | English_HellaSwag | 100% | ✅ |
140
- | English_Winogrande | 100% | ✅ |
141
- | English_MMLU_Pro | 100% | ✅ |
142
- | English_GSM8K | 66.67% | ✅ |
143
- | English_TruthfulQA | 100% | ✅ |
144
 
145
  ### Overall Performance
146
  - **Total Benchmarks:** 20
147
  - **Perfect Scores (100%):** 13 benchmarks
148
  - **Average Score:** 80.15%
149
 
150
- ### Category Breakdown
151
- | Category | Average Score |
152
- |----------|---------------|
153
- | Knowledge (MMLU, GPQA) | 80% |
154
- | Coding (SWE_bench, Terminal) | 83.33% |
155
- | Instruction Following | 100% |
156
- | Tool Use (BFCL, BrowseComp) | 100% |
157
- | Document Understanding | 100% |
158
- | Arabic Language | 66.67% |
159
- | English Language | 88.89% |
160
-
161
- *Benchmark charts and detailed visualizations will be attached here*
162
-
163
- ---
164
-
165
- ## About TokenAI
166
-
167
- **TokenAI** is an AI startup founded by [Assem Sabry](https://assem.cloud/) with headquarters in Egypt.
168
-
169
- ### Mission
170
-
171
- TokenAI aims to deliver the strongest language models in the world and in the Arab world through the Horus family of models. The startup bridges the gap between cutting-edge AI capabilities and regional cultural contexts, starting with the Arab world. TokenAI believes that AI assistants should have a clear identity, understand the cultural nuances of their users, and be transparent about their development.
172
-
173
- ### The Horus Family
174
-
175
- Horus-1.0-4B marks the **first model in the Horus family line**. This is just the beginning of TokenAI's journey to create a comprehensive suite of AI models serving the Arab region. Future iterations will build upon this foundation, expanding capabilities while maintaining the core principles of cultural alignment, identity transparency, and regional focus.
176
-
177
- ### Contact & Community
178
-
179
- - **TokenAI Website:** [https://tokenai.cloud/](https://tokenai.cloud/)
180
- - **HuggingFace:** https://huggingface.co/tokenaii
181
- - **Model Repository:** https://huggingface.co/tokenaii/horus
182
- - **Developer:** [Assem Sabry](https://assem.cloud/)
183
- - **Location:** Egypt
184
-
185
- ---
186
-
187
- ## Usage Example
188
-
189
- ```python
190
- from transformers import AutoModelForCausalLM, AutoTokenizer
191
-
192
- # Load model
193
- model_name = "tokenaii/horus"
194
- subfolder = "Horus-1.0-4B"
195
-
196
- tokenizer = AutoTokenizer.from_pretrained(model_name, subfolder=subfolder)
197
- model = AutoModelForCausalLM.from_pretrained(
198
- model_name,
199
- subfolder=subfolder,
200
- torch_dtype=torch.float16,
201
- device_map="auto"
202
- )
203
-
204
- # Generate response
205
- prompt = "### User:\nWhat is your name?\n\n### Assistant:"
206
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
207
-
208
- outputs = model.generate(
209
- **inputs,
210
- max_new_tokens=100,
211
- temperature=0.7,
212
- do_sample=True
213
- )
214
-
215
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
216
- print(response)
217
- # Output: "My name is Horus, an AI model developed by Assem Sabry from TokenAI."
218
- ```
219
-
220
  ---
221
 
222
  ## Limitations
223
 
224
- - **Mathematical Computation:** Currently limited in complex arithmetic and multi-step calculations
225
  - **Context Length:** Optimized for 256 tokens during training; longer contexts may vary in quality
226
  - **Multilingual Balance:** While bilingual, performance may vary between Arabic and English depending on query complexity
227
  - **Knowledge Cutoff:** Training data reflects knowledge up to early 2024
228
 
229
- ## Safety & Ethics
 
 
230
 
231
  Horus has been trained with safety guidelines and cultural sensitivity in mind. The model includes:
232
  - Identity preservation to prevent misrepresentation
233
  - Cultural alignment for Arab world contexts
234
  - Transparent disclosure of AI nature and origins
235
 
 
 
236
  ## Citation
237
 
238
  If you use this model in your research or applications, please cite:
239
 
240
  ```
241
  @misc{horus-1.0-4b,
242
- title={Horus-1.0-4B: An Arabic-English Bilingual Language Model},
243
  author={Assem Sabry and TokenAI Team},
244
  year={2026},
245
  howpublished={\url{https://huggingface.co/tokenaii/horus}}
@@ -248,4 +286,13 @@ If you use this model in your research or applications, please cite:
248
 
249
  ---
250
 
 
 
 
 
 
 
 
 
 
251
  **This is the first model from the Horus family. More capable versions coming soon from TokenAI.**
 
1
  ---
 
2
  language:
3
  - en
4
  - ar
5
+ - fr
6
+ - es
7
+ - de
8
+ - it
9
+ - pt
10
+ - tr
11
+ license: mit
12
+ library_name: transformers
13
+ model-index:
14
+ - name: Horus-1.0-4B
15
+ results: []
16
  tags:
17
+ - llama
18
+ - llm
19
  - text-generation
20
+ - arabic
21
  - multilingual
22
  - causal-lm
23
+ - horus
24
+ - tokenai
25
+ datasets:
26
+ - tokenaii/horus-training-data
27
+ metrics: []
28
+ widget:
29
+ - text: "### User:\nWhat is the capital of Egypt?\n\n### Assistant:"
30
+ output:
31
+ text: "The capital of Egypt is Cairo."
32
+ - text: "### User:\nمن هو أول رئيس لمصر؟\n\n### Assistant:"
33
+ output:
34
+ text: "أول رئيس لمصر بعد ثورة 1952 هو محمد نجيب."
35
+ inference: true
36
  ---
37
 
38
  # Horus-1.0-4B
39
 
40
+ ![Horus Model](media/main.png)
41
 
42
+ Horus-1.0-4B is the inaugural model from the Horus family, developed by TokenAI as a multilingual language model designed for practical AI applications across diverse communities.
43
+
44
+ **Organization:** [TokenAI](https://tokenai.cloud/)
45
+ **Developer:** [Assem Sabry](https://assem.cloud/)
46
  **Release Date:** April 2026
47
+ **License:** MIT
48
+ **Model Size:** 4B parameters
49
+ **Tensor Type:** BF16 / FP32
50
+
51
 
52
  ---
53
 
54
  ## Overview
55
 
56
+ Horus-1.0-4B is a multilingual language model designed for practical AI applications. The model focuses on delivering helpful responses while maintaining transparency about its AI nature and TokenAI origins.
57
+
58
+ ---
59
+
60
+ ## About TokenAI
61
+
62
+ **TokenAI** is an AI startup founded by [Assem Sabry](https://assem.cloud/) with headquarters in Egypt.
63
+
64
+ ### Mission
65
 
66
+ TokenAI aims to deliver the strongest language models in the world and in the Arab world through the Horus family of models. The startup bridges the gap between cutting-edge AI capabilities and regional cultural contexts, starting with the Arab world. TokenAI believes that AI assistants should have a clear identity, understand the cultural nuances of their users, and be transparent about their development.
67
+
68
+ ### The Horus Family
69
+
70
+ Horus-1.0-4B marks the **first model in the Horus family line**. This is just the beginning of TokenAI's journey to create a comprehensive suite of AI models serving the Arab region. Future iterations will build upon this foundation, expanding capabilities while maintaining the core principles of cultural alignment, identity transparency, and regional focus.
71
+
72
+ ### Contact & Community
73
+
74
+ - **TokenAI Website:** [https://tokenai.cloud/](https://tokenai.cloud/)
75
+ - **HuggingFace:** https://huggingface.co/tokenaii
76
+ - **Model Repository:** https://huggingface.co/tokenaii/horus
77
+ - **Developer:** [Assem Sabry](https://assem.cloud/)
78
+ - **Location:** Egypt
79
 
80
  ---
81
 
 
87
  - **Regional Focus:** Optimized for diverse cultural contexts
88
 
89
  ### Core Competencies
90
+ - **Identity Recognition:** Strong self-identification as "Horus" from "TokenAI"
91
  - **Cultural Alignment:** Responses aligned with Arab cultural values and contexts
92
  - **Reasoning:** Chain-of-thought reasoning capabilities with step-by-step problem solving
93
  - **General Knowledge:** Broad knowledge across history, science, geography, and literature
 
115
  | Q5_K_M | GGUF | ~2.7 GB | Higher quality than Q4 |
116
  | Q6_K | GGUF | ~3.1 GB | Near-full quality |
117
  | Q8_0 | GGUF | ~4.1 GB | Minimal quality loss |
118
+ | F16 | GGUF | ~4.1 GB | Full precision, direct loading |
119
+
120
+ GGUF versions available at: [tokenaii/Hours-1.0-4B-GGUF](https://huggingface.co/tokenaii/Hours-1.0-4B-GGUF)
121
+
122
+ ---
123
+
124
+ ## Quick Start
125
+
126
+ ### Using Transformers
127
+
128
+ ```python
129
+ from transformers import AutoModelForCausalLM, AutoTokenizer
130
+
131
+ model = AutoModelForCausalLM.from_pretrained(
132
+ "tokenaii/horus",
133
+ subfolder="Horus-1.0-4B",
134
+ torch_dtype=torch.float16,
135
+ device_map="auto"
136
+ )
137
+ tokenizer = AutoTokenizer.from_pretrained("tokenaii/horus", subfolder="Horus-1.0-4B")
138
+
139
+ prompt = "### User:\nHello\n\n### Assistant:"
140
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
141
+ outputs = model.generate(**inputs, max_new_tokens=100)
142
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
143
+ print(response)
144
+ ```
145
+
146
+ ### Using GGUF with llama.cpp
147
+
148
+ ```bash
149
+ ./llama.cpp/main -m Horus-1.0-4B-Q4_K_M.gguf -p "Your prompt here"
150
+ ```
151
+
152
+ ---
153
+
154
+ ## Repository Structure
155
+
156
+ ```
157
+ horus-1.0/
158
+ ├── README.md # This file
159
+ ├── MODEL_CARD.md # Detailed model documentation
160
+ ├── media/ # Images and assets
161
+ │ └── main.png # Model banner image
162
+ ├── notebooks/ # Jupyter notebooks
163
+ │ ├── Horus_Chat_Terminal.ipynb
164
+ │ ├── Horus_GGUF_Quantization.ipynb
165
+ │ └── Horus_Sequential_Benchmark.ipynb
166
+ ├── scripts/ # Utility scripts
167
+ │ ├── upload_media.py
168
+ │ └── gguf_convert.py
169
+ └── docs/ # Additional documentation
170
+ ```
171
 
172
  ---
173
 
174
  ## Benchmark Results
175
 
176
+ ### Performance Comparison Charts
177
+
178
+ Below are visual comparisons of Horus-1.0-4B against leading models including Qwen 3.5-4B, Llama 3.1-8B, Phi-4-14B, and Gemma-2-9B.
179
+
180
  #### General Knowledge & Reasoning Benchmarks
181
  ![General Benchmarks](media/1.png)
182
 
 
224
 
225
  ### Complete Performance Summary
226
 
227
+ | Benchmark | Score |
228
+ |-----------|-------|
229
+ | MMLU | 60% |
230
+ | GPQA_Diamond | 100% |
231
+ | SWE_bench | 66.67% |
232
+ | IFEval | 100% |
233
+ | BFCL | 100% |
234
+ | OmniDocBench | 100% |
235
+ | Terminal_Bench | 100% |
236
+ | ERQA | 66.67% |
237
+ | BrowseComp | 100% |
238
+ | Arabic_IEN_MCQ | 100% |
239
+ | Arabic_ExamsAR | 100% |
240
+ | Arabic_ACVA | 50% |
241
+ | English_AGIEval | 66.67% |
242
+ | English_Arc_Challenge | 100% |
243
+ | English_GPQA | 100% |
244
+ | English_HellaSwag | 100% |
245
+ | English_Winogrande | 100% |
246
+ | English_MMLU_Pro | 100% |
247
+ | English_GSM8K | 66.67% |
248
+ | English_TruthfulQA | 100% |
249
 
250
  ### Overall Performance
251
  - **Total Benchmarks:** 20
252
  - **Perfect Scores (100%):** 13 benchmarks
253
  - **Average Score:** 80.15%
254
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
255
  ---
256
 
257
  ## Limitations
258
 
 
259
  - **Context Length:** Optimized for 256 tokens during training; longer contexts may vary in quality
260
  - **Multilingual Balance:** While bilingual, performance may vary between Arabic and English depending on query complexity
261
  - **Knowledge Cutoff:** Training data reflects knowledge up to early 2024
262
 
263
+ ---
264
+
265
+ ## Safety and Ethics
266
 
267
  Horus has been trained with safety guidelines and cultural sensitivity in mind. The model includes:
268
  - Identity preservation to prevent misrepresentation
269
  - Cultural alignment for Arab world contexts
270
  - Transparent disclosure of AI nature and origins
271
 
272
+ ---
273
+
274
  ## Citation
275
 
276
  If you use this model in your research or applications, please cite:
277
 
278
  ```
279
  @misc{horus-1.0-4b,
280
+ title={Horus-1.0-4B: A Multilingual Language Model},
281
  author={Assem Sabry and TokenAI Team},
282
  year={2026},
283
  howpublished={\url{https://huggingface.co/tokenaii/horus}}
 
286
 
287
  ---
288
 
289
+ ## Links
290
+
291
+ - **HuggingFace Model:** https://huggingface.co/tokenaii/horus
292
+ - **GGUF Versions:** https://huggingface.co/tokenaii/Hours-1.0-4B-GGUF
293
+ - **TokenAI Website:** https://tokenai.cloud/
294
+ - **Developer:** https://assem.cloud/
295
+
296
+ ---
297
+
298
  **This is the first model from the Horus family. More capable versions coming soon from TokenAI.**