File size: 4,006 Bytes
4857cb1 5d9abcc 4857cb1 5d9abcc 1ffdfdd 5d9abcc 4857cb1 5d9abcc a998cc2 5d9abcc c6b88ea 5d9abcc 31fbe01 5d9abcc 31fbe01 c3a2013 31fbe01 5d9abcc f1c51e4 5d9abcc f1c51e4 5d9abcc 31fbe01 5d9abcc 31fbe01 5d9abcc 31fbe01 5d9abcc 31fbe01 de0ec4d 5d9abcc 31fbe01 5d9abcc 31fbe01 5d9abcc 31fbe01 5d9abcc 31fbe01 c1ec29a 71e1f72 31fbe01 c472949 3e93f51 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ---
language:
- he
- en
license: apache-2.0
library_name: mamba
tags:
- mamba2
- moe
- hebrew
- finance
- legal
- ssm
model_name: HEBATRON
base_model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
pipeline_tag: text-generation
---

# 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel** and **MAFAT** and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
## 🚀 Model Summary
HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
---
## 📂 Technical Specifications
| Feature | Specification |
| :--- | :--- |
| **Model Name** | HEBATRON |
| **Architecture** | Hybrid **Mamba2** (SSM) + **Sparse MoE** |
| **Total Parameters** | 31.6B |
| **Active Parameters** | ~3B per token |
| **Context Window** | 65,536 (64k) tokens |
| **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
| **Precision** | FP8 Mixed-Precision |
---
## ⚙️ Deployment Configuration
To ensure optimal performance in production, the following environment variables and parameters are recommended for the **vLLM** backend:
### **Inference Engine (vLLM)**
* **Port:** `8002` (Default for Model B slot)
* **Max Model Length:** `65536` tokens
* **GPU Memory Utilization:** Recommended `0.90` - `0.95` for Blackwell/H200.
### **Model Parameters**
* **Max New Tokens:** `65536`
* **Temperature:** `0.7` (Balanced creativity and precision)
* **Top-P:** `0.9`
### **Server Settings**
* **Max Simultaneous Comparisons:** `1` (Recommended for 30B+ MoE on single node to maintain latency)
* **Chat Context Max Turns:** `10`
* **Max Prompt Characters:** `10000`
---
## 🧬 Training Curriculum
The model was trained using a three-phase **Curriculum Learning** strategy:
1. **Phase 1: Formal Foundation (75.5B tokens)**
Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
2. **Phase 2: Colloquial Expansion (3.36B tokens)**
Integration of social media, forums, and informal web data to handle slang and modern registers.
3. **Phase 3: Long-Context Extension (20.4B tokens)**
Fine-tuning on dense, long-form documents to stabilize the 64k context window.
4. **Alignment:** Supervised Fine-Tuning (SFT) was performed on **2 million samples**, including localized knowledge distillation and the **"Hebrew IFEval"** dataset.
---
## 📊 Performance Evaluation
### Hebrew Reasoning Benchmarks
* **SNLI (Semantic Reasoning):** 91.2% accuracy
* **Israeli Trivia:** 72.1% (+14pt vs base)
* **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
* **GSM8K (Math):** 83.3% accuracy in native Hebrew
### English Reasoning Benchmarks
* **Psychometric Psi (EN):** 91.6%
* **English Reasoning Average:** 86.0%
---
## 🎯 Intended Use & Limitations
* **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
* **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
---
## 🤝 Credits
* **Developed by:** PwC Israel & MAFAT
* **MAFAT Lead:** Tal Geva [project Lead], Matan Frank
* **Technical Lead:** Sarel Weinberger (PwC Next)
* **PwC Israel Team:** Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arvatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
* **MAFAT Team:** Noam Ordan, Nadav Cordova
* **Partners:** Amir Nissan Hacohen (Origin.ai)
* **Research Collaborators:** Shaltiel Shmidman (Dicta), Mike Erlihson
* **AWS Infrastructures:** Ilouz Netanel |