Slasky
/

HebrewGPT-1B-v2

+---
+language:
+- he
+license: apache-2.0
+tags:
+- hebrew
+- continued-pretraining
+- language-model
+- text-generation
+- mamba
+- transformer
+pipeline_tag: text-generation
+---
+# HebrewGPT-1B-v2
+A **1.08 billion parameter** Hebrew language model, version 2 with continued pre-training on 7.8B additional Hebrew tokens from HeDC4 (Hebrew Digital Corpus).
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Parameters** | 1.08B |
+| **Base Model** | [HebrewGPT-1B](https://huggingface.co/Slasky/HebrewGPT-1B) |
+| **Architecture** | Custom Mamba-Transformer hybrid (interleaved RoPE + Mamba SSM, SwiGLU MLP) |
+| **Total Training Tokens** | 17.6B (9.8B original + 7.8B HeDC4) |
+| **Best Val Loss** | 3.3937 (BPB 4.90) |
+| **Training** | 15,000 steps, p4d.24xlarge (8xA100 40GB), 14.3 hours |
+| **Context Length** | 2,048 tokens |
+| **Tokenizer** | SentencePiece BPE, 32K vocab |
+## What is new in v2
+HebrewGPT-1B-v2 extends the original model with continued pre-training on HeDC4, a diverse modern Hebrew web corpus. This nearly doubles the total training data from 9.8B to 17.6B tokens, adding:
+- Modern Hebrew news and web content
+- Broader vocabulary coverage
+- More diverse writing styles and topics
+## Evaluation (Base Model, no SFT)
+| Task | v1 (9.8B tokens) | v2 (17.6B tokens) |
+|------|-------------------|-------------------|
+| SNLI | 50.0% | 35.0% |
+| QA | 20.0% | 60.0% |
+| Sentiment | 33.3% | 33.3% |
+| Trivia | 13.3% | 13.3% |
+| **Average** | **29.2%** | **35.4%** |
+Note: These are base model results without instruction tuning. QA improved significantly (+40pp) suggesting better reading comprehension from the additional pre-training data. An instruction-tuned version (SFT on v2) is in progress.
+## Pre-Training Data
+### Phase 1: Original (9.8B tokens)
+Hebrew Wikipedia (12%), Supreme Court (22%), Ben Yehuda (23%), C4 Hebrew (20%), CC100 (19%), Task data (4%)
+### Phase 2: Continued Pre-Training (7.8B tokens)
+HeDC4 — Hebrew Digital Corpus for the 21st Century
+## Related Models
+- [HebrewGPT-1B](https://huggingface.co/Slasky/HebrewGPT-1B) - Base model (v1)
+- [HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct) - SFT on v1
+- [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) - AdamW ablation
+- [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) - Smaller model
+## Infrastructure
+Trained on Amazon Bedrock (research orchestration) + AWS EC2. Cost: ~$73 (14.3h x $5.12/hr spot).