HebArabNlpProject
/

Hebatron_base

Text Generation

Mixture of Experts

Model card Files Files and versions

sarel commited on 12 days ago

Commit

4860051

·

verified ·

1 Parent(s): 77936e2

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,6 +16,8 @@ base_model: nvidia/nemotron-3-nano-30b-base
 pipeline_tag: text-generation
 ---
 # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
 HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
@@ -33,7 +35,7 @@ HEBATRON is designed to handle the structural and morphological complexities of
 | **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
 | **Total Parameters** | 31.6B |
 | **Active Parameters** | ~3B per token |
-| **Context Window** | 65,536 (64k) tokens |
 | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
 | **Precision** | FP8 Mixed-Precision |

 pipeline_tag: text-generation
 ---
+![image](https://cdn-uploads.huggingface.co/production/uploads/60a75f5523ce37179774a20b/hJ8gE5j7w3Frnf4xgGcjq.png)
 # 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
 HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
 | **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
 | **Total Parameters** | 31.6B |
 | **Active Parameters** | ~3B per token |
+| **Context Window** | 8096 tokens |
 | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
 | **Precision** | FP8 Mixed-Precision |