Update README.md

ad3ef49 verified 21 days ago

3.97 kB

	---
	language:
	- he
	- en
	license: llama3.2
	base_model: meta-llama/Llama-3.2-1B-Instruct
	tags:
	- llama-3.2
	- hebrew
	- instruction-tuned
	- sft
	- safetensors
	- nlp
	model_name: Hebrew-GPT
	model_type: causal-lm
	precision: bfloat16
	---

	# Hebrew-GPT: Specialized 1B Hebrew Instruction Model

	Hebrew-GPT is a state-of-the-art, instruction-tuned Small Language Model (SLM) based on the Llama-3.2-1B architecture. It has been engineered to bridge the gap in low-parameter Hebrew linguistic performance, providing a compact yet powerful solution for Hebrew natural language understanding and generation.



	---

	## 💎 Model Highlights

	* Linguistic Specialization: Specifically tuned to handle the Morphologically Rich Language (MRL) features of Hebrew, including prefix-suffix handling and correct right-to-left (RTL) context awareness.
	* 16-bit Precision: Unlike many quantized small models, this version features Full Merged BFloat16 weights, ensuring no loss of intelligence from the fine-tuning process.
	* Instruction Optimized: Trained specifically to follow complex prompts, summarize documents, and engage in dialogue, rather than just basic text completion.
	* Efficiency: At 1 billion parameters, it is optimized for edge deployment, providing high-speed inference on standard consumer hardware.

	---

	## 🛠 Technical Specifications

	### Architecture
	- Base Architecture: Llama 3.2
	- Parameters: 1.23 Billion
	- Context Length: 128k tokens (native support)
	- Weight Format: Safetensors (Standalone)
	- Precision: BFloat16 ($BF16$)

	### Training Methodology
	The model underwent Supervised Fine-Tuning (SFT) using a curated multi-source dataset strategy to ensure high-quality Hebrew output without compromising logical reasoning:
	* Hebrew Instruction Set (70%): Extensive Alpaca-formatted datasets translated and corrected for Hebrew grammar.
	* Hebrew Contextual Knowledge (20%): Fact-based data from Hebrew wikis and structured Q&A.
	* Logic Preservation (10%): High-quality English instructional data to maintain cross-lingual reasoning and mathematical stability.

	---

	## 📈 Performance & Monitoring

	During the development phase, the model was monitored via detailed telemetry to ensure stable convergence. Key metrics tracked included:
	- Gradient Norm Stability: Monitored to prevent exploding gradients in RTL text generation.
	- VRAM Optimization: Efficiently managed to maximize batch size and learning stability.
	- Loss Decay: Consistent downward trend in cross-entropy loss across all three data streams.



	---

	## 🚀 Quick Start Guide

	### Installation
	```bash
	pip install transformers torch accelerate
	```
	### Basic Usage (Python)
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "XythicK/Hebrew-GPT"

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Standard Llama-3.2 Chat Template
	messages = [
	{"role": "system", "content": "אתה עוזר חכם ומקצועי בעברית."},
	{"role": "user", "content": "כתוב לי מתכון קצר לחלה לשבת."},
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	### ⚖️ Ethics and Limitations
	While Hebrew-GPT is highly capable for its size, users should note:

	Hallucination: Like all LLMs, it can generate incorrect facts. Verify critical information.

	Bias: The model reflects the biases present in its training data.

	Parameter Constraints: As a 1B model, it may struggle with highly technical academic subjects compared to 70B+ models.