Hebrew_Nemo / README.md

Update README.md

63e7e77 verified about 1 month ago

11 kB

	---
	language:
	- he
	- en
	license: apache-2.0
	tags:
	- mistral
	- nemo
	- hebrew
	- llm
	- text-generation
	- instruction-tuned
	- chat
	pipeline_tag: text-generation
	base_model: mistralai/Mistral-Nemo-Base-2407
	library_name: transformers
	widget:
	- text: "Hebrew_Nemo"
	output:
	url: https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo/resolve/main/Images/Hebrew_Nemo.png
	---

	# Hebrew_Nemo: State-of-the-Art Hebrew Language Model

	---

	<div align="center">
	<b style="font-size: 50px;">Hebrew_Nemo</b>


	</div>


	<div align="center">
	<b style="font-size: 80px;">12B</b>


	</div>


	---

	<div align="center" style="font-size: 18px; margin-top: 20px;">
	<b>Developed by:</b> <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a>
	</div>

	---

	Hebrew_Nemo is a state-of-the-art (SOTA) Hebrew language large language model specifically optimized for Hebrew language understanding and generation. Built upon the Mistral Nemo architecture, this model represents a significant advancement in Hebrew NLP capabilities, combining the robust multilingual foundations of Mistral Nemo with extensive Hebrew-specific fine-tuning and optimization.

	As part of [SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff) efforts to truly democratize AI, [Hebrew_Nemo](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo) is released with a permissive Apache 2.0 license. The model demonstrates competitive performance with Gemma3-27B, one of the world’s leading open-source models in multilingual capabilities—despite Gemma3-27B being more than twice its size. This result highlights Hebrew_Nemo’s efficiency and effectiveness, making SOTA capabilities widely available for consumers, as well as corporations.

	Unfortunately, Gemma-3-27b-it doesn't benchmark well, but I still believe Gemma-3-27b-it is by far the best multi-lingual model:

	\| Model \| Average \| SNLI Acc \| QA (HeQ) \| Translation BLEU \| Israeli Trivia \| Params (B) \|
	\|-------\|---------\|----------\|----------\|------------------\|----------------\|------------\|
	\| google/gemma-3-27b-pt \| 69.5 \| 85.24 \| 78.27 \| 36.45 \| 70.43 \| 27 \|
	\| google/gemma-3-27b-it \| 13.41 \| 0 \| 80.31 \| 0.17 \| 0 \| 27 \|

	---

	# Benchmarks

	---

	Hebrew_Nemo demonstrates SOTA performance for its size, with particularly outstanding results in Hebrew translation. At only 12B parameters, it achieves a BLEU score of 30.83, outperforming significantly larger models such as DeepSeek-14B and AI21 Jamba-Mini (52B)— a model more than x4 times its size.

	The model maintains high competence across reasoning and QA, with SNLI accuracy of 79.76 and HeQ score of 70.51, indicating solid sentence-level understanding and contextual reasoning in Hebrew. Its Israeli Trivia score (50.83) demonstrates exceptional knowledge for its size, coming very close to a model more than 4x times its size, while vastly outperforming models of similar and even of a slightly larger size.


	\| Model \| Average \| SNLI Acc \| QA (HeQ) \| Translation BLEU \| Israeli Trivia \| Params (B) \|
	\| ---------------------------------------- \| --------: \| --------: \| --------: \| ---------------: \| -------------: \| ---------: \|
	\| Hebrew_Nemo \| 57.98 \| 79.76 \| 70.51 \| 30.83 \| 50.83 \| 12 \|
	\| ai21labs/AI21-Jamba-1.5-Mini \| 54.68 \| 69.52 \| 69.38 \| 22.00 \| 57.81 \| 52 \|
	\| deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \| 53.19 \| 85.48 \| 71.38 \| 22.99 \| 32.89 \| 14 \|
	\| SicariusSicariiStuff/Zion_Alpha \| 53.55 \| 84.05 \| 67.67 \| 27.93 \| 34.55 \| 7 \|
	\| Qwen/Qwen3-8B \| 53.54 \| 80.00 \| 78.53 \| 25.73 \| 29.90 \| 8 \|
	\| Mistral-Nemo-Base-2407 \| 51.24 \| 65.95 \| 68.48 \| 28.99 \| 41.53 \| 12.0 \|

	---

	Hebrew_Nemo also vastly improves upon the original Mistral Nemo by adding massive amounts of new knowledge while refining existing capabilities:

	\| Metric \| Hebrew_Nemo \| Mistral-Nemo-Base \| (% Improvement) \|
	\| :------------------- \| ----------: \| ----------------: \| ----------------: \|
	\| Average \| 57.98 \| 51.24 \| +13.2% \|
	\| SNLI Accuracy \| 79.76 \| 65.95 \| +20.9% \|
	\| QA (HeQ) \| 70.51 \| 68.48 \| +3.0% \|
	\| Translation BLEU \| 30.83 \| 28.99 \| +6.3% \|
	\| Israeli Trivia \| 50.83 \| 41.53 \| +22.4% \|

	----



	### Technical Overview

	- Model Type: Causal Language Model (Decoder-only Transformer)
	- Base Architecture: Mistral Nemo
	- Language Focus: Hebrew (עברית) with maintained multilingual capabilities
	- License: Apache 2.0
	- Parameters: 12B
	- Context Length: 128K tokens
	- Layers: 40
	- Dim: 5,120
	- Head dim: 128
	- Hidden dim: 14,336
	- Activation Function: SwiGLU
	- Number of heads: 32
	- Number of kv-heads: 8 (GQA)
	- Vocabulary size: 2**17 ~= 128k
	- Rotary embeddings (theta = 1M)

	### Primary Use Cases

	- Hebrew Text Generation: High-quality content creation in modern Hebrew
	- Translation: Bidirectional translation between Hebrew and other languages
	- Question Answering: Advanced reasoning and comprehension in Hebrew contexts
	- Dialogue Systems: Conversational AI applications for Hebrew speakers
	- Text Classification: Sentiment analysis, topic modeling, and categorization of Hebrew content
	- Named Entity Recognition: Extraction of entities from Hebrew text
	- Summarization: Concise summaries of Hebrew documents and articles

	### Out-of-Scope Uses

	- Real-time critical decision-making systems (medical, legal, financial) without human oversight
	- Generation of content intended to deceive or manipulate
	- Applications requiring 100% factual accuracy without verification


	## Training Data and Training Methodology

	Hebrew_Nemo was trained on a diverse corpus including:

	\| Source Type \| Description \| Language Coverage \|
	\|--------------\|--------------\|------------------\|
	\| Hebrew Wikipedia \| Encyclopedia-style text \| 100% Hebrew \|
	\| Hebrew Literature & Proverbs \| Classic and modern \| 100% Hebrew \|
	\| Hebrew-English Code-Mix \| Social media & dialogue \| 70% Hebrew / 30% English \|
	\| Synthetic Data \| Instruction-following & reasoning \| Mixed \|

	Data was filtered, normalized, and token-balanced to reduce bias and improve generalization across dialects.

	Additional data trained:

	- Modern Hebrew web text and news articles
	- Hebrew literature and academic publications
	- Biblical and Rabbinic Hebrew texts for cultural depth
	- Hebrew social media and conversational data
	- Technical documentation in Hebrew
	- Parallel corpora for translation capabilities

	---

	The training process involved:

	1. Continued pre-training on Hebrew-rich datasets
	2. Instruction fine-tuning on Hebrew task-specific data
	3. Alignment through RLHF/DPO for Hebrew linguistic preferences

	---

	## 🚀 Key Features

	- Native Hebrew Understanding: Trained on millions of high-quality Hebrew documents spanning literature, news, Wikipedia, academic, and colloquial domains.
	- Contextual Mastery: Handles complex anaphora, idiomatic expressions, and mixed Hebrew-English text with high fidelity.
	- Instruction-Tuned: Aligned for chat, Q&A, summarization, and reasoning use cases.
	- Cultural Awareness: Sensitive to Hebrew cultural, religious, and social nuances.
	- Optimized Inference: Enhanced performance with Mistral’s memory-efficient attention and dynamic context window.

	---

	# Out of scope usage
	* Generating disinformation or biased political content
	* Automated decision-making without human oversight

	---

	## ⚙️ Limitations

	* May reflect training corpus biases (e.g., urban dialect prevalence, widespread opinions in Israeli social media)
	* Limited performance on rare biblical or archaic Hebrew
	* Occasionally mixes Hebrew and English when the context is ambiguous
	* Does not include alignment for safety moderation out of the box

	---

	# Model instruction template: ChatML

	```
	<\|im_start\|>system
	You answer the questions in Hebrew.<\|im_end\|>
	<\|im_start\|>User
	{prompt}<\|im_end\|>
	<\|im_start\|>AI answer
	```

	---

	## 🗣️ Example Usage

	### Basic Inference

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "SicariusSicariiStuff/Hebrew_Nemo"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	prompt = "מהי בינה מלאכותית?"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	### Chat Format

	```python
	messages = [
	{"role": "user", "content": "ספר לי על ההיסטוריה של ירושלים"}
	]

	formatted_prompt = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Quantization (for lower VRAM)

	```python
	from transformers import BitsAndBytesConfig

	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=quantization_config,
	device_map="auto"
	)
	```

	---

	## Available quantizations:

	- Original: [FP16](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo)
	- GGUF: [Static Quants](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_GGUF)
	- Specialized: [FP8](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_FP8)
	- Mobile (ARM): [Q4_0](https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo_ARM)

	---


	## Citation

	```bibtex
	@misc{hebrew_nemo_2025,
	author = {SicariusSicariiStuff},
	title = {Hebrew_Nemo: State-of-the-Art Hebrew Language Model},
	year = {2025},
	publisher = {Hugging Face},
	url = {https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo}
	}
	```


	## 🧰 Acknowledgements

	* [Mistral](https://mistral.ai/) for the base architecture
	* [NVIDIA NeMo](https://developer.nvidia.com/nemo) framework inspiration
	* Employee#11 for her unwavering support

	## Contact

	For questions, issues, or collaboration opportunities:
	- HuggingFace: [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)
	- Issues: Report technical issues on the model repository


	### Model Card Authors
	- [@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff)