--- language: - he - en license: apache-2.0 tags: - mistral - nemo - hebrew - llm - text-generation - instruction-tuned - chat pipeline_tag: text-generation base_model: mistralai/Mistral-Nemo-Base-2407 library_name: transformers widget: - text: "Hebrew_Nemo" output: url: https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/resolve/main/Images/Hebrew_Nemo.png --- # Hebrew_Nemo: State-of-the-Art Hebrew Language Model ---
Hebrew_Nemo
12B
---
Developed by: SicariusSicariiStuff
--- **Hebrew_Nemo** is a state-of-the-art (SOTA) **Hebrew language large language model** specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization. As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive **Apache 2.0** license. The model demonstrates competitive performance with **Gemma3-27B**, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being **more than twice its size**. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations. Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model: | Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) | |-------|---------|----------|----------|------------------|----------------|------------| | google/gemma-3-27b-pt | 69.5 | 85.24 | 78.27 | 36.45 | 70.43 | 27 | | google/gemma-3-27b-it | 13.41 | 0 | 80.31 | 0.17 | 0 | 27 | --- # Benchmarks --- **Hebrew_Nemo** demonstrates SOTA performance for its size, with particularly **outstanding results in Hebrew translation**. At only **12B parameters**, it achieves a **BLEU score of 30.83**, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size. The model maintains **high competence across reasoning and QA**, with **SNLI accuracy of 79.76** and **HeQ score of 70.51**, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its **Israeli Trivia score (50.83)** demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size. | Model | Average | SNLI Acc | QA (HeQ) | Translation BLEU | Israeli Trivia | Params (B) | | ---------------------------------------- | --------: | --------: | --------: | ---------------: | -------------: | ---------: | | **Hebrew_Nemo** | **57.98** | 79.76 | 70.51 | **30.83** | 50.83 | 12 | | ai21labs/AI21-Jamba-1.5-Mini | 54.68 | 69.52 | 69.38 | 22.00 | **57.81** | 52 | | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 53.19 | **85.48** | 71.38 | 22.99 | 32.89 | 14 | | SicariusSicariiStuff/Zion_Alpha | 53.55 | 84.05 | 67.67 | 27.93 | 34.55 | 7 | | Qwen/Qwen3-8B | 53.54 | 80.00 | **78.53** | 25.73 | 29.90 | 8 | | Mistral-Nemo-Base-2407 | 51.24 | 65.95 | 68.48 | 28.99 | 41.53 | 12.0 | --- **Hebrew_Nemo** also **vastly improves** upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities: | Metric | Hebrew_Nemo | Mistral-Nemo-Base | (% Improvement) | | :------------------- | ----------: | ----------------: | ----------------: | | **Average** | **57.98** | 51.24 | **+13.2%** | | **SNLI Accuracy** | **79.76** | 65.95 | **+20.9%** | | **QA (HeQ)** | **70.51** | 68.48 | **+3.0%** | | **Translation BLEU** | **30.83** | 28.99 | **+6.3%** | | **Israeli Trivia** | **50.83** | 41.53 | **+22.4%** | ---- ### Technical Overview - **Model Type:** Causal Language Model (Decoder-only Transformer) - **Base Architecture:** Mistral Nemo - **Language Focus:** Hebrew (עברית) with maintained multilingual capabilities - **License:** Apache 2.0 - **Parameters:** 12B - **Context Length:** 128K tokens - **Layers:** 40 - **Dim:** 5,120 - **Head dim:** 128 - **Hidden dim:** 14,336 - **Activation Function:** SwiGLU - **Number of heads:** 32 - **Number of kv-heads:** 8 (GQA) - **Vocabulary size:** 2**17 ~= 128k - **Rotary embeddings (theta = 1M)** ### Primary Use Cases - **Hebrew Text Generation:** High-quality content creation in modern Hebrew - **Translation:** Bidirectional translation between Hebrew and other languages - **Question Answering:** Advanced reasoning and comprehension in Hebrew contexts - **Dialogue Systems:** Conversational AI applications for Hebrew speakers - **Text Classification:** Sentiment analysis, topic modeling, and categorization of Hebrew content - **Named Entity Recognition:** Extraction of entities from Hebrew text - **Summarization:** Concise summaries of Hebrew documents and articles ### Out-of-Scope Uses - Real-time critical decision-making systems (medical, legal, financial) without human oversight - Generation of content intended to deceive or manipulate - Applications requiring 100% factual accuracy without verification ## Training Data and Training Methodology Hebrew_Nemo was trained on a diverse corpus including: | Source Type | Description | Language Coverage | |--------------|--------------|------------------| | Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew | | Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew | | Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English | | Synthetic Data | Instruction-following & reasoning | Mixed | Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects. Additional data trained: - Modern Hebrew web text and news articles - Hebrew literature and academic publications - Biblical and Rabbinic Hebrew texts for cultural depth - Hebrew social media and conversational data - Technical documentation in Hebrew - Parallel corpora for translation capabilities --- **The training process involved:** 1. Continued pre-training on Hebrew-rich datasets 2. Instruction fine-tuning on Hebrew task-specific data 3. Alignment through RLHF/DPO for Hebrew linguistic preferences --- ## 🚀 Key Features - **Native Hebrew Understanding:** Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains. - **Contextual Mastery:** Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity. - **Instruction-Tuned:** Aligned for chat, Q&A, summarization, and reasoning use cases. - **Cultural Awareness:** Sensitive to Hebrew cultural, religious, and social nuances. - **Optimized Inference:** Enhanced performance with Mistral’s memory-efficient attention and dynamic context window. --- # Out of scope usage * Generating disinformation or biased political content * Automated decision-making without human oversight --- ## ⚙️ Limitations * May reflect **training corpus biases** (e.g., urban dialect prevalence, widespread opinions in Israeli social media) * Limited performance on **rare biblical or archaic Hebrew** * Occasionally mixes Hebrew and English when the context is ambiguous * Does not include alignment for safety moderation out of the box --- # Model instruction template: ChatML ``` <|im_start|>system You answer the questions in Hebrew.<|im_end|> <|im_start|>User {prompt}<|im_end|> <|im_start|>AI answer ``` --- ## 🗣️ Example Usage ### Basic Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "SicariusSicariiStuff/Hebrew_Nemo" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) prompt = "מהי בינה מלאכותית?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ### Chat Format ```python messages = [ {"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"} ] formatted_prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Quantization (for lower VRAM) ```python from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=quantization_config, device_map="auto" ) ``` --- ## Available quantizations: - Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) - GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF) - Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8) - Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM) --- ## Citation ```bibtex @misc{hebrew_nemo_2025, author = {SicariusSicariiStuff}, title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo} } ``` ## 🧰 Acknowledgements * [Mistral](https://mistral.ai/) for the base architecture * [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration * Employee#11 for her unwavering support ## Contact For questions, issues, or collaboration opportunities: - **HuggingFace:** [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) - **Issues:** Report technical issues on the model repository ### Model Card Authors - [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)