HebrewGPT-1B-Instruct

A 1.08 billion parameter Hebrew instruction-tuned language model, fine-tuned from HebrewGPT-1B on 61K balanced Hebrew instruction examples.

Model Details

Property Value
Parameters 1.08B
Architecture Custom Mamba-Transformer hybrid (interleaved RoPE attention + Mamba SSM, SwiGLU MLP)
Base Model HebrewGPT-1B (pretrained with Muon optimizer + SWA)
Context Length 2,048 tokens
Tokenizer SentencePiece BPE, 8,192 vocab, Hebrew morphology-aware with prefix splitting
License Apache 2.0
Language Hebrew (he)

Architecture

HebrewGPT-1B-Instruct uses the same hybrid architecture as the base model:

  • Width: 1024, Depth: 8 layers, Heads: 8 (head_dim=128)
  • Interleaved blocks: Alternating RoPE multi-head attention and Mamba SSM layers
  • MLP: SwiGLU activation
  • Positional encoding: Rotary Position Embeddings (RoPE)

Base Model: HebrewGPT-1B

Built on HebrewGPT-1B, a 1.08B parameter model trained from scratch on Hebrew text.

Pre-Training Data (12 Hebrew Datasets, 9.8B tokens)

Dataset Share Description
Hebrew Wikipedia 12% Encyclopedia articles
Supreme Court Rulings 22% Israeli legal corpus
Ben Yehuda Project 23% Classic Hebrew literature
C4 Hebrew 20% Web-crawled text (cleaned)
CC100 Hebrew 19% CommonCrawl filtered
Task-specific 4% QA, NLI, sentiment prompts

Pre-Training Details

  • Tokens: 9.8B (3.9 epochs over 2.48B unique)
  • Hardware: 8ร—H100 80GB (p5.48xlarge), 8 hours
  • Optimizer: Muon + SWA (12.3% better BPB than AdamW at 1B scale)
  • Perplexity: 29.75 (SWA)
  • Research: 200 autonomous experiments across 4 versions, 100% hit rate in v4
  • Paper: Autonomous AI-Driven Hebrew Language Model Research
  • Ablation: HebrewGPT-1B-AdamW (same architecture, AdamW optimizer)

Training

SFT Configuration

  • Method: Full Supervised Fine-Tuning (SFT)
  • Training steps: 3,000
  • Best validation loss: 2.9598
  • Hardware: Single NVIDIA A10G GPU (AWS g5.2xlarge)
  • Training time: ~6.5 hours
  • SFT fine-tuning tokens: ~20.3M
  • Base model pre-training: 9.8B tokens (12 diverse Hebrew datasets including Wikipedia, Supreme Court, Ben Yehuda, C4, CC100)

Instruction Dataset (61K examples)

The model was fine-tuned on a balanced mix of Hebrew instruction-following tasks:

Category Examples Description
QA (HeQ) 15,000 Hebrew question answering
Sentiment 10,000 Hebrew sentiment analysis
NLI 2,938 Natural language inference
Summarization (HeSum) 10,000 Hebrew text summarization
Translation 15,000 Hebrew-English translation
Alpaca 5,000 General instruction following (translated)
Dolly 2,000 Open-domain instruction following
Chat 1,000 Conversational Hebrew
Winograd 278 Coreference resolution

Usage

import torch
import sentencepiece as spm

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.Load("tokenizer.model")

# Load model weights
state_dict = torch.load("model.pt", map_location="cpu")
# Initialize model architecture (see HebrewGPT-1B for model class definition)
# model.load_state_dict(state_dict)

Prompt Format

The model was trained with a structured instruction format:

### ื”ื•ืจืื”:
{instruction}

### ืงืœื˜:
{input}

### ืชืฉื•ื‘ื”:
{response}

Evaluation

Evaluation on Hebrew benchmarks requires GPU inference. Base model (HebrewGPT-1B) results for comparison:

Task Base Model Instruct (SFT)
SNLI 50% Pending
Sentiment 33% Pending
QA 20% Pending
Trivia 13% Pending
Average 29.2% Pending

SFT evaluation will be run on GPU and updated here. The instruction-tuned model is expected to show significant improvements on structured tasks (QA, sentiment, NLI) that were part of the SFT training mix.

Infrastructure

  • Research Orchestration: Amazon Bedrock (Claude) via OpenClaw
  • Training Compute: AWS EC2 g5.2xlarge (NVIDIA A10G)
  • Data Pipeline: Automated dataset collection, translation, and balancing

Files

  • model.pt โ€” SFT fine-tuned model state dict (2.1 GB)
  • tokenizer.model โ€” SentencePiece BPE tokenizer (8,192 vocab)

Citation

@misc{hebrewgpt1b-instruct-2026,
  title={HebrewGPT-1B-Instruct: A Hebrew Instruction-Tuned Language Model},
  author={Slasky, Ronnen},
  year={2026},
  url={https://huggingface.co/Slasky/HebrewGPT-1B-Instruct}
}

Limitations

  • Small vocabulary (8,192 tokens) may limit performance on rare words
  • 2,048 context window limits long-document tasks
  • Trained primarily on structured instruction tasks; open-ended generation quality may vary
  • Hebrew-specific model โ€” limited multilingual capability beyond Hebrew-English translation

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support