SicariusSicariiStuff
/

Hebrew_Nemo

+---
+language:
+  - he
+  - en
+license: apache-2.0
+tags:
+  - mistral
+  - nemo
+  - hebrew
+  - llm
+  - text-generation
+  - instruction-tuned
+  - chat
+pipeline_tag: text-generation
+base_model: mistralai/Mistral-Nemo-Base-2407
+library_name: transformers
+---
+# Hebrew_Nemo: State-of-the-Art Hebrew Language Model
+---
+<div align="center">
+  <b style="font-size: 50px;">Hebrew_Nemo</b>
+</div>
+<div align="center">
+  <b style="font-size: 80px;">12B</b>
+</div>
+---
+<div align="center" style="font-size: 18px; margin-top: 20px;">
+  <b>Developed by:</b> <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a>
+</div>
+---
+**Hebrew_Nemo** is a state-of-the-art (SOTA) **Hebrew language large language model** specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.
+As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive **Apache 2.0** license. The model demonstrates competitive performance with **Gemma3-27B**, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being **more than twice its size**. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.
+### Technical Overview
+- **Model Type:** Causal Language Model (Decoder-only Transformer)
+- **Base Architecture:** Mistral Nemo
+- **Language Focus:** Hebrew (עברית) with maintained multilingual capabilities
+- **License:** Apache 2.0
+- **Parameters:** 12B
+- **Context Length:** 128K tokens
+- **Layers:** 40
+- **Dim:** 5,120
+- **Head dim:** 128
+- **Hidden dim:** 14,336
+- **Activation Function:** SwiGLU
+- **Number of heads:** 32
+- **Number of kv-heads:** 8 (GQA)
+- **Vocabulary size:** 2**17 ~= 128k
+- **Rotary embeddings (theta = 1M)**
+### Primary Use Cases
+- **Hebrew Text Generation:** High-quality content creation in modern Hebrew
+- **Translation:** Bidirectional translation between Hebrew and other languages
+- **Question Answering:** Advanced reasoning and comprehension in Hebrew contexts
+- **Dialogue Systems:** Conversational AI applications for Hebrew speakers
+- **Text Classification:** Sentiment analysis, topic modeling, and categorization of Hebrew content
+- **Named Entity Recognition:** Extraction of entities from Hebrew text
+- **Summarization:** Concise summaries of Hebrew documents and articles
+### Out-of-Scope Uses
+- Real-time critical decision-making systems (medical, legal, financial) without human oversight
+- Generation of content intended to deceive or manipulate
+- Applications requiring 100% factual accuracy without verification
+## Training Data and Training Methodology
+Hebrew_Nemo was trained on a diverse corpus including:
+| Source Type | Description | Language Coverage |
+|--------------|--------------|------------------|
+| Hebrew Wikipedia | Encyclopedia-style text | 100% Hebrew |
+| Hebrew Literature & Proverbs | Classic and modern | 100% Hebrew |
+| Hebrew-English Code-Mix | Social media & dialogue | 70% Hebrew / 30% English |
+| Synthetic Data | Instruction-following & reasoning | Mixed |
+Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.
+Additional data trained:
+- Modern Hebrew web text and news articles
+- Hebrew literature and academic publications
+- Biblical and Rabbinic Hebrew texts for cultural depth
+- Hebrew social media and conversational data
+- Technical documentation in Hebrew
+- Parallel corpora for translation capabilities
+---
+**The training process involved:**
+1. Continued pre-training on Hebrew-rich datasets
+2. Instruction fine-tuning on Hebrew task-specific data
+3. Alignment through RLHF/DPO for Hebrew linguistic preferences
+---
+## 🚀 Key Features
+- **Native Hebrew Understanding:** Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
+- **Contextual Mastery:** Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
+- **Instruction-Tuned:** Aligned for chat, Q&A, summarization, and reasoning use cases.
+- **Cultural Awareness:** Sensitive to Hebrew cultural, religious, and social nuances.
+- **Optimized Inference:** Enhanced performance with Mistral’s memory-efficient attention and dynamic context window.
+---
+# Out of scope usage
+* Generating disinformation or biased political content
+* Automated decision-making without human oversight
+---
+## ⚙️ Limitations
+* May reflect **training corpus biases** (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
+* Limited performance on **rare biblical or archaic Hebrew**
+* Occasionally mixes Hebrew and English when the context is ambiguous
+* Does not include alignment for safety moderation out of the box
+---
+## 🗣️ Example Usage
+### Basic Inference
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "SicariusSicariiStuff/Hebrew_Nemo"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+prompt = "מהי בינה מלאכותית?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+### Chat Format
+```python
+messages = [
+    {"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
+]
+formatted_prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Quantization (for lower VRAM)
+```python
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+---
+## Available quantizations:
+- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo)
+- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF)
+- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8)
+- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM)
+---
+## Citation
+```bibtex
+@misc{hebrew_nemo_2025,
+  author = {SicariusSicariiStuff},
+  title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
+}
+```
+## 🧰 Acknowledgements
+* [Mistral](https://mistral.ai/) for the base architecture
+* [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration
+* Employee#11 for her unwavering support
+## Contact
+For questions, issues, or collaboration opportunities:
+- **HuggingFace:** [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
+- **Issues:** Report technical issues on the model repository
+### Model Card Authors
+- [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)