sarel commited on
Commit
555d8d2
Β·
verified Β·
1 Parent(s): cd8fd90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -53
README.md CHANGED
@@ -1,60 +1,87 @@
 
1
  language:
2
- - he
3
- - en
4
  license: apache-2.0
5
  library_name: mamba
6
  tags:
7
- - mamba2
8
- - moe
9
- - hebrew
10
- - finance
11
- - legal
12
- - ssm
13
  model_name: HEBATRON
14
  base_model: nvidia/nemotron-3-nano-30b-base
15
  pipeline_tag: text-generation
16
- image
17
-
18
- πŸ›‘οΈ HEBATRON: Hebrew-Specialized Mamba2-MoE
19
- HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between PwC Israel and MAFAT and AWS, it introduces a unique hybrid architecture combining Mamba2 and Mixture-of-Experts (MoE).
20
-
21
- πŸš€ Model Summary
22
- HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the Nemotron-3-Nano-30B framework, optimized for native-level reasoning in Hebrew and English.
23
-
24
- πŸ“‚ Technical Specifications
25
- Feature Specification
26
- Model Name HEBATRON
27
- Architecture Hybrid Mamba2 (SSM) + Sparse MoE
28
- Total Parameters 31.6B
29
- Active Parameters ~3B per token
30
- Context Window 65,536 (64k) tokens
31
- Hardware NVIDIA Blackwell (B300) & H200 GPUs
32
- Precision FP8 Mixed-Precision
33
- 🧬 Training Curriculum
34
- The model was trained using a three-phase Curriculum Learning strategy:
35
-
36
- Phase 1: Formal Foundation (75.5B tokens) Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
37
- Phase 2: Colloquial Expansion (3.36B tokens) Integration of social media, forums, and informal web data to handle slang and modern registers.
38
- Phase 3: Long-Context Extension (20.4B tokens) Fine-tuning on dense, long-form documents to stabilize the 64k context window.
39
-
40
- πŸ“Š Performance Evaluation
41
- Hebrew Reasoning Benchmarks
42
- SNLI (Semantic Reasoning): 91.2% accuracy
43
- Israeli Trivia: 72.1% (+14pt vs base)
44
- Hebrew Average Reasoning: 73.8% (Surpassing DictaLM-3.0-Thinking)
45
- GSM8K (Math): 83.3% accuracy in native Hebrew
46
- English Reasoning Benchmarks
47
- Psychometric Psi (EN): 91.6%
48
- English Reasoning Average: 86.0%
49
- 🎯 Intended Use & Limitations
50
- Intended Use: Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
51
- Limitations: Users should verify outputs for factual accuracy as with any Large Language Model.
52
- 🀝 Credits
53
- Developed by: PwC Israel & MAFAT
54
- MAFAT Lead: Tal Geva [project Lead], Matan Frank
55
- Technical Lead: Sarel Weinberger (PwC Next)
56
- PwC Israel Team: Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
57
- MAFAT Team: Noam Ordan, Nadav Cordova
58
- Partners: Amir Nissan Hacohen (Origin.ai)
59
- Research Collaborators: Shaltiel Shmidman (Dicta), Mike Erlihson
60
- AWS Infrastructures: Ilouz Netanel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
  language:
3
+ - he
4
+ - en
5
  license: apache-2.0
6
  library_name: mamba
7
  tags:
8
+ - mamba2
9
+ - moe
10
+ - hebrew
11
+ - finance
12
+ - legal
13
+ - ssm
14
  model_name: HEBATRON
15
  base_model: nvidia/nemotron-3-nano-30b-base
16
  pipeline_tag: text-generation
17
+ ---
18
+
19
+ # πŸ›‘οΈ HEBATRON: Hebrew-Specialized Mamba2-MoE
20
+
21
+ HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between **PwC Israel**, **MAFAT**, and **AWS**, it introduces a unique hybrid architecture combining **Mamba2** and **Mixture-of-Experts (MoE)**.
22
+
23
+ ## πŸš€ Model Summary
24
+ HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the **Nemotron-3-Nano-30B** framework, optimized for native-level reasoning in Hebrew and English.
25
+
26
+ ---
27
+
28
+ ## πŸ“‚ Technical Specifications
29
+
30
+ | Feature | Specification |
31
+ | :--- | :--- |
32
+ | **Model Name** | HEBATRON |
33
+ | **Architecture** | Hybrid Mamba2 (SSM) + Sparse MoE |
34
+ | **Total Parameters** | 31.6B |
35
+ | **Active Parameters** | ~3B per token |
36
+ | **Context Window** | 65,536 (64k) tokens |
37
+ | **Hardware** | NVIDIA Blackwell (B300) & H200 GPUs |
38
+ | **Precision** | FP8 Mixed-Precision |
39
+
40
+ ---
41
+
42
+ ## 🧬 Training Curriculum
43
+ The model was trained using a three-phase **Curriculum Learning** strategy:
44
+
45
+ 1. **Phase 1: Formal Foundation (75.5B tokens)**
46
+ Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
47
+ 2. **Phase 2: Colloquial Expansion (3.36B tokens)**
48
+ Integration of social media, forums, and informal web data to handle slang and modern registers.
49
+ 3. **Phase 3: Long-Context Extension (20.4B tokens)**
50
+ Fine-tuning on dense, long-form documents to stabilize the 64k context window.
51
+
52
+ ---
53
+
54
+ ## πŸ“Š Performance Evaluation
55
+
56
+ ### Hebrew Reasoning Benchmarks
57
+ * **SNLI (Semantic Reasoning):** 91.2% accuracy
58
+ * **Israeli Trivia:** 72.1% (+14pt vs base)
59
+ * **Hebrew Average Reasoning:** 73.8% (Surpassing DictaLM-3.0-Thinking)
60
+ * **GSM8K (Math):** 83.3% accuracy in native Hebrew
61
+
62
+ ### English Reasoning Benchmarks
63
+ * **Psychometric Psi (EN):** 91.6%
64
+ * **English Reasoning Average:** 86.0%
65
+
66
+ ---
67
+
68
+ ## 🎯 Intended Use & Limitations
69
+ * **Intended Use:** Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
70
+ * **Limitations:** Users should verify outputs for factual accuracy as with any Large Language Model.
71
+
72
+ ---
73
+
74
+ ## 🀝 Credits
75
+
76
+ ### **Project Leadership**
77
+ * **MAFAT Lead:** Tal Geva (Project Lead), Matan Frank
78
+ * **Technical Lead:** Sarel Weinberger (PwC Next)
79
+
80
+ ### **Core Teams**
81
+ * **PwC Israel Team:** Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
82
+ * **MAFAT Team:** Noam Ordan, Nadav Cordova
83
+
84
+ ### **Partners & Collaborators**
85
+ * **Partners:** Amir Nissan Hacohen (Origin.ai)
86
+ * **Research Collaborators:** Shaltiel Shmidman (Dicta), Mike Erlihson
87
+ * **Infrastructure:** Netanel Ilouz (AWS)