Update README.md
Browse files
README.md
CHANGED
|
@@ -16,6 +16,8 @@ base_model: nvidia/nemotron-3-nano-30b-base
|
|
| 16 |
pipeline_tag: text-generation
|
| 17 |
---
|
| 18 |
|
|
|
|
|
|
|
| 19 |
# 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
|
| 20 |
|
| 21 |
HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
|
|
@@ -33,7 +35,7 @@ HEBATRON is designed to handle the structural and morphological complexities of
|
|
| 33 |
| **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
|
| 34 |
| **Total Parameters** | 31.6B |
|
| 35 |
| **Active Parameters** | ~3B per token |
|
| 36 |
-
| **Context Window** |
|
| 37 |
| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
|
| 38 |
| **Precision** | FP8 Mixed-Precision |
|
| 39 |
|
|
|
|
| 16 |
pipeline_tag: text-generation
|
| 17 |
---
|
| 18 |
|
| 19 |
+

|
| 20 |
+
|
| 21 |
# 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
|
| 22 |
|
| 23 |
HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
|
|
|
|
| 35 |
| **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
|
| 36 |
| **Total Parameters** | 31.6B |
|
| 37 |
| **Active Parameters** | ~3B per token |
|
| 38 |
+
| **Context Window** | 8096 tokens |
|
| 39 |
| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
|
| 40 |
| **Precision** | FP8 Mixed-Precision |
|
| 41 |
|