HebArabNlpProject
/

Hebatron_base

Text Generation

Mixture of Experts

Model card Files Files and versions

sarel commited on 11 days ago

Commit

009ee7a

·

verified ·

1 Parent(s): 1721368

Update README.md

Files changed (1) hide show

README.md +60 -3

README.md CHANGED Viewed

@@ -1,3 +1,60 @@
----
-license: apache-2.0
----

+language:
+  - he
+  - en
+license: apache-2.0
+library_name: mamba
+tags:
+  - mamba2
+  - moe
+  - hebrew
+  - finance
+  - legal
+  - ssm
+model_name: HEBATRON
+base_model: nvidia/nemotron-3-nano-30b-base
+pipeline_tag: text-generation
+image
+🛡️ HEBATRON: Hebrew-Specialized Mamba2-MoE
+HEBATRON is a state-of-the-art, high-performance language model specialized for the Hebrew language. Developed through a collaboration between PwC Israel and MAFAT and AWS, it introduces a unique hybrid architecture combining Mamba2 and Mixture-of-Experts (MoE).
+🚀 Model Summary
+HEBATRON is designed to handle the structural and morphological complexities of Hebrew while providing linear scaling for long-context tasks. It is a localized and enhanced version of the Nemotron-3-Nano-30B framework, optimized for native-level reasoning in Hebrew and English.
+📂 Technical Specifications
+Feature	Specification
+Model Name	HEBATRON
+Architecture	Hybrid Mamba2 (SSM) + Sparse MoE
+Total Parameters	31.6B
+Active Parameters	~3B per token
+Context Window	65,536 (64k) tokens
+Hardware	NVIDIA Blackwell (B300) & H200 GPUs
+Precision	FP8 Mixed-Precision
+🧬 Training Curriculum
+The model was trained using a three-phase Curriculum Learning strategy:
+Phase 1: Formal Foundation (75.5B tokens) Focused on high-quality, structured Hebrew (legal, academic, and literary texts) to establish core grammatical rules.
+Phase 2: Colloquial Expansion (3.36B tokens) Integration of social media, forums, and informal web data to handle slang and modern registers.
+Phase 3: Long-Context Extension (20.4B tokens) Fine-tuning on dense, long-form documents to stabilize the 64k context window.
+📊 Performance Evaluation
+Hebrew Reasoning Benchmarks
+SNLI (Semantic Reasoning): 91.2% accuracy
+Israeli Trivia: 72.1% (+14pt vs base)
+Hebrew Average Reasoning: 73.8% (Surpassing DictaLM-3.0-Thinking)
+GSM8K (Math): 83.3% accuracy in native Hebrew
+English Reasoning Benchmarks
+Psychometric Psi (EN): 91.6%
+English Reasoning Average: 86.0%
+🎯 Intended Use & Limitations
+Intended Use: Advanced Hebrew document analysis, long-context summarization (legal/technical), and complex bilingual reasoning.
+Limitations: Users should verify outputs for factual accuracy as with any Large Language Model.
+🤝 Credits
+Developed by: PwC Israel & MAFAT
+MAFAT Lead: Tal Geva [project Lead], Matan Frank
+Technical Lead: Sarel Weinberger (PwC Next)
+PwC Israel Team: Noam Kayzer, Dan Revital, Ori Bar Joseph, Smadar Arbatz, Or Levi, Kate Zinkovskaia, Zevi Apini, Omer Baruch (PwC Next)
+MAFAT Team: Noam Ordan, Nadav Cordova
+Partners: Amir Nissan Hacohen (Origin.ai)
+Research Collaborators: Shaltiel Shmidman (Dicta), Mike Erlihson
+AWS Infrastructures: Ilouz Netanel