XythicK
/

Hebrew-GPT

@@ -1,10 +1,115 @@
----
-base_model: unsloth/llama-3.2-1b-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- llama
-license: apache-2.0
 language:
 - en
----

+# Expanded README with heavy focus on technical architecture and Hebrew linguistic nuances
+big_readme_content = """---
 language:
+- he
 - en
+license: llama3.2
+base_model: meta-llama/Llama-3.2-1B-Instruct
+tags:
+- llama-3.2
+- hebrew
+- instruction-tuned
+- sft
+- safetensors
+- nlp
+model_name: Hebrew-GPT
+model_type: causal-lm
+precision: bfloat16
+---
+# Hebrew-GPT: Specialized 1B Hebrew Instruction Model 🇮🇱
+**Hebrew-GPT** is a state-of-the-art, instruction-tuned Small Language Model (SLM) based on the **Llama-3.2-1B** architecture. It has been engineered to bridge the gap in low-parameter Hebrew linguistic performance, providing a compact yet powerful solution for Hebrew natural language understanding and generation.
+---
+## 💎 Model Highlights
+* **Linguistic Specialization:** Specifically tuned to handle the Morphologically Rich Language (MRL) features of Hebrew, including prefix-suffix handling and correct right-to-left (RTL) context awareness.
+* **16-bit Precision:** Unlike many quantized small models, this version features **Full Merged BFloat16 weights**, ensuring no loss of intelligence from the fine-tuning process.
+* **Instruction Optimized:** Trained specifically to follow complex prompts, summarize documents, and engage in dialogue, rather than just basic text completion.
+* **Efficiency:** At 1 billion parameters, it is optimized for edge deployment, providing high-speed inference on standard consumer hardware.
+---
+## 🛠 Technical Specifications
+### Architecture
+- **Base Architecture:** Llama 3.2
+- **Parameters:** 1.23 Billion
+- **Context Length:** 128k tokens (native support)
+- **Weight Format:** Safetensors (Standalone)
+- **Precision:** BFloat16 ($BF16$)
+### Training Methodology
+The model underwent **Supervised Fine-Tuning (SFT)** using a curated multi-source dataset strategy to ensure high-quality Hebrew output without compromising logical reasoning:
+* **Hebrew Instruction Set (70%):** Extensive Alpaca-formatted datasets translated and corrected for Hebrew grammar.
+* **Hebrew Contextual Knowledge (20%):** Fact-based data from Hebrew wikis and structured Q&A.
+* **Logic Preservation (10%):** High-quality English instructional data to maintain cross-lingual reasoning and mathematical stability.
+---
+## 📈 Performance & Monitoring
+During the development phase, the model was monitored via detailed telemetry to ensure stable convergence. Key metrics tracked included:
+- **Gradient Norm Stability:** Monitored to prevent exploding gradients in RTL text generation.
+- **VRAM Optimization:** Efficiently managed to maximize batch size and learning stability.
+- **Loss Decay:** Consistent downward trend in cross-entropy loss across all three data streams.
+---
+## 🚀 Quick Start Guide
+### Installation
+```bash
+pip install transformers torch accelerate
+```
+### Basic Usage (Python)
+```
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "XythicK/Hebrew-GPT"
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+# Standard Llama-3.2 Chat Template
+messages = [
+    {"role": "system", "content": "אתה עוזר חכם ומקצועי בעברית."},
+    {"role": "user", "content": "כתוב לי מתכון קצר לחלה לשבת."},
+]
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=256,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.9,
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### ⚖️ Ethics and Limitations
+While Hebrew-GPT is highly capable for its size, users should note:
+Hallucination: Like all LLMs, it can generate incorrect facts. Verify critical information.
+Bias: The model reflects the biases present in its training data.
+Parameter Constraints: As a 1B model, it may struggle with highly technical academic subjects compared to 70B+ models.