sarel commited on
Commit
009ee7a
·
verified ·
1 Parent(s): 1721368

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ language:
2
+ - he
3
+ - en
4
+ license: apache-2.0
5
+ library_name: mamba
6
+ tags:
7
+ - mamba2
8
+ - moe
9
+ - hebrew
10
+ - finance
11
+ - legal
12
+ - ssm
13
+ model_name: HEBATRON
14
+ base_model: nvidia/nemotron-3-nano-30b-base
15
+ pipeline_tag: text-generation
16
+ image
17
+
18
+ 🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
19
+ HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between PwC Israel and MAFAT and AWS, it introduces a unique hybrid architecture combining Mamba2 and Mixture-of-Experts (MoE).
20
+
21
+ 🚀 Model Summary
22
+ HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the Nemotron-3-Nano-30B framework, optimized for native-level reasoning in Hebrew and English.
23
+
24
+ 📂 Technical Specifications
25
+ Feature Specification
26
+ Model Name HEBATRON
27
+ Architecture Hybrid Mamba2 (SSM) + Sparse MoE
28
+ Total Parameters 31.6B
29
+ Active Parameters ~3B per token
30
+ Context Window 65,536 (64k) tokens
31
+ Hardware NVIDIA Blackwell (B300) & H200 GPUs
32
+ Precision FP8 Mixed-Precision
33
+ 🧬 Training Curriculum
34
+ The model was trained using a three-phase Curriculum Learning strategy:
35
+
36
+ Phase 1: Formal Foundation (75.5B tokens) Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
37
+ Phase 2: Colloquial Expansion (3.36B tokens) Integration of social media, forums, and informal web data to handle slang and modern registers.
38
+ Phase 3: Long-Context Extension (20.4B tokens) Fine-tuning on dense, long-form documents to stabilize the 64k context window.
39
+
40
+ 📊 Performance Evaluation
41
+ Hebrew Reasoning Benchmarks
42
+ SNLI (Semantic Reasoning): 91.2% accuracy
43
+ Israeli Trivia: 72.1% (+14pt vs base)
44
+ Hebrew Average Reasoning: 73.8% (Surpassing DictaLM-3.0-Thinking)
45
+ GSM8K (Math): 83.3% accuracy in native Hebrew
46
+ English Reasoning Benchmarks
47
+ Psychometric Psi (EN): 91.6%
48
+ English Reasoning Average: 86.0%
49
+ 🎯 Intended Use & Limitations
50
+ Intended Use: Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
51
+ Limitations: Users should verify outputs for factual accuracy as with any Large Language Model.
52
+ 🤝 Credits
53
+ Developed by: PwC Israel & MAFAT
54
+ MAFAT Lead: Tal Geva [project Lead], Matan Frank
55
+ Technical Lead: Sarel Weinberger (PwC Next)
56
+ PwC Israel Team: Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
57
+ MAFAT Team: Noam Ordan, Nadav Cordova
58
+ Partners: Amir Nissan Hacohen (Origin.ai)
59
+ Research Collaborators: Shaltiel Shmidman (Dicta), Mike Erlihson
60
+ AWS Infrastructures: Ilouz Netanel