Slasky
/

HebrewGPT-1B-Instruct

+---
+language:
+- he
+license: apache-2.0
+tags:
+- hebrew
+- instruction-tuning
+- sft
+- language-model
+- text-generation
+- mamba
+- transformer
+pipeline_tag: text-generation
+model-index:
+- name: HebrewGPT-1B-Instruct
+  results: []
+---
+# HebrewGPT-1B-Instruct
+A **1.08 billion parameter** Hebrew instruction-tuned language model, fine-tuned from [HebrewGPT-1B](https://huggingface.co/Slasky/HebrewGPT-1B) on 61K balanced Hebrew instruction examples.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Parameters** | 1.08B |
+| **Architecture** | Custom Mamba-Transformer hybrid (interleaved RoPE attention + Mamba SSM, SwiGLU MLP) |
+| **Base Model** | HebrewGPT-1B (pretrained with Muon optimizer + SWA) |
+| **Context Length** | 2,048 tokens |
+| **Tokenizer** | SentencePiece BPE, 8,192 vocab, Hebrew morphology-aware with prefix splitting |
+| **License** | Apache 2.0 |
+| **Language** | Hebrew (he) |
+## Architecture
+HebrewGPT-1B-Instruct uses the same hybrid architecture as the base model:
+- **Width:** 1024, **Depth:** 8 layers, **Heads:** 8 (head_dim=128)
+- **Interleaved blocks:** Alternating RoPE multi-head attention and Mamba SSM layers
+- **MLP:** SwiGLU activation
+- **Positional encoding:** Rotary Position Embeddings (RoPE)
+## Training
+### SFT Configuration
+- **Method:** Full Supervised Fine-Tuning (SFT)
+- **Training steps:** 3,000
+- **Best validation loss:** 2.9598
+- **Hardware:** Single NVIDIA A10G GPU (AWS g5.2xlarge)
+- **Training time:** ~6.5 hours
+- **Total training tokens:** ~20.3M
+### Instruction Dataset (61K examples)
+The model was fine-tuned on a balanced mix of Hebrew instruction-following tasks:
+| Category | Examples | Description |
+|----------|----------|-------------|
+| QA (HeQ) | 15,000 | Hebrew question answering |
+| Sentiment | 10,000 | Hebrew sentiment analysis |
+| NLI | 2,938 | Natural language inference |
+| Summarization (HeSum) | 10,000 | Hebrew text summarization |
+| Translation | 15,000 | Hebrew-English translation |
+| Alpaca | 5,000 | General instruction following (translated) |
+| Dolly | 2,000 | Open-domain instruction following |
+| Chat | 1,000 | Conversational Hebrew |
+| Winograd | 278 | Coreference resolution |
+## Usage
+```python
+import torch
+import sentencepiece as spm
+# Load tokenizer
+sp = spm.SentencePieceProcessor()
+sp.Load("tokenizer.model")
+# Load model weights
+state_dict = torch.load("model.pt", map_location="cpu")
+# Initialize model architecture (see HebrewGPT-1B for model class definition)
+# model.load_state_dict(state_dict)
+```
+### Prompt Format
+The model was trained with a structured instruction format:
+```
+### הוראה:
+{instruction}
+### קלט:
+{input}
+### תשובה:
+{response}
+```
+## Evaluation
+Evaluation results coming soon. Base model (HebrewGPT-1B) benchmarks:
+| Task | Base Model |
+|------|-----------|
+| SNLI | 50% |
+| Sentiment | 33% |
+| QA | 20% |
+| Trivia | 13% |
+| **Average** | **29.2%** |
+## Infrastructure
+- **Research Orchestration:** Amazon Bedrock (Claude) via OpenClaw
+- **Training Compute:** AWS EC2 g5.2xlarge (NVIDIA A10G)
+- **Data Pipeline:** Automated dataset collection, translation, and balancing
+## Files
+- `model.pt` — SFT fine-tuned model state dict (2.1 GB)
+- `tokenizer.model` — SentencePiece BPE tokenizer (8,192 vocab)
+## Citation
+```bibtex
+@misc{hebrewgpt1b-instruct-2026,
+  title={HebrewGPT-1B-Instruct: A Hebrew Instruction-Tuned Language Model},
+  author={Slasky, Ronnen},
+  year={2026},
+  url={https://huggingface.co/Slasky/HebrewGPT-1B-Instruct}
+}
+```
+## Limitations
+- Small vocabulary (8,192 tokens) may limit performance on rare words
+- 2,048 context window limits long-document tasks
+- Trained primarily on structured instruction tasks; open-ended generation quality may vary
+- Hebrew-specific model — limited multilingual capability beyond Hebrew-English translation
+## License
+Apache 2.0