--- language: - he - en license: llama3.2 base_model: meta-llama/Llama-3.2-1B-Instruct tags: - llama-3.2 - hebrew - instruction-tuned - sft - safetensors - nlp model_name: Hebrew-GPT model_type: causal-lm precision: bfloat16 --- # Hebrew-GPT: Specialized 1B Hebrew Instruction Model **Hebrew-GPT** is a state-of-the-art, instruction-tuned Small Language Model (SLM) based on the **Llama-3.2-1B** architecture. It has been engineered to bridge the gap in low-parameter Hebrew linguistic performance, providing a compact yet powerful solution for Hebrew natural language understanding and generation. --- ## 馃拵 Model Highlights * **Linguistic Specialization:** Specifically tuned to handle the Morphologically Rich Language (MRL) features of Hebrew, including prefix-suffix handling and correct right-to-left (RTL) context awareness. * **16-bit Precision:** Unlike many quantized small models, this version features **Full Merged BFloat16 weights**, ensuring no loss of intelligence from the fine-tuning process. * **Instruction Optimized:** Trained specifically to follow complex prompts, summarize documents, and engage in dialogue, rather than just basic text completion. * **Efficiency:** At 1 billion parameters, it is optimized for edge deployment, providing high-speed inference on standard consumer hardware. --- ## 馃洜 Technical Specifications ### Architecture - **Base Architecture:** Llama 3.2 - **Parameters:** 1.23 Billion - **Context Length:** 128k tokens (native support) - **Weight Format:** Safetensors (Standalone) - **Precision:** BFloat16 ($BF16$) ### Training Methodology The model underwent **Supervised Fine-Tuning (SFT)** using a curated multi-source dataset strategy to ensure high-quality Hebrew output without compromising logical reasoning: * **Hebrew Instruction Set (70%):** Extensive Alpaca-formatted datasets translated and corrected for Hebrew grammar. * **Hebrew Contextual Knowledge (20%):** Fact-based data from Hebrew wikis and structured Q&A. * **Logic Preservation (10%):** High-quality English instructional data to maintain cross-lingual reasoning and mathematical stability. --- ## 馃搱 Performance & Monitoring During the development phase, the model was monitored via detailed telemetry to ensure stable convergence. Key metrics tracked included: - **Gradient Norm Stability:** Monitored to prevent exploding gradients in RTL text generation. - **VRAM Optimization:** Efficiently managed to maximize batch size and learning stability. - **Loss Decay:** Consistent downward trend in cross-entropy loss across all three data streams. --- ## 馃殌 Quick Start Guide ### Installation ```bash pip install transformers torch accelerate ``` ### Basic Usage (Python) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "XythicK/Hebrew-GPT" # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) # Standard Llama-3.2 Chat Template messages = [ {"role": "system", "content": "讗转讛 注讜讝专 讞讻诐 讜诪拽爪讜注讬 讘注讘专讬转."}, {"role": "user", "content": "讻转讜讘 诇讬 诪转讻讜谉 拽爪专 诇讞诇讛 诇砖讘转."}, ] input_ids = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( input_ids, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.9, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### 鈿栵笍 Ethics and Limitations While Hebrew-GPT is highly capable for its size, users should note: Hallucination: Like all LLMs, it can generate incorrect facts. Verify critical information. Bias: The model reflects the biases present in its training data. Parameter Constraints: As a 1B model, it may struggle with highly technical academic subjects compared to 70B+ models.