ronnengmail commited on
Commit
b666978
·
verified ·
1 Parent(s): 793fdad

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +144 -0
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - he
4
+ license: apache-2.0
5
+ tags:
6
+ - hebrew
7
+ - instruction-tuning
8
+ - sft
9
+ - language-model
10
+ - text-generation
11
+ - mamba
12
+ - transformer
13
+ pipeline_tag: text-generation
14
+ model-index:
15
+ - name: HebrewGPT-1B-Instruct
16
+ results: []
17
+ ---
18
+
19
+ # HebrewGPT-1B-Instruct
20
+
21
+ A **1.08 billion parameter** Hebrew instruction-tuned language model, fine-tuned from [HebrewGPT-1B](https://huggingface.co/Slasky/HebrewGPT-1B) on 61K balanced Hebrew instruction examples.
22
+
23
+ ## Model Details
24
+
25
+ | Property | Value |
26
+ |----------|-------|
27
+ | **Parameters** | 1.08B |
28
+ | **Architecture** | Custom Mamba-Transformer hybrid (interleaved RoPE attention + Mamba SSM, SwiGLU MLP) |
29
+ | **Base Model** | HebrewGPT-1B (pretrained with Muon optimizer + SWA) |
30
+ | **Context Length** | 2,048 tokens |
31
+ | **Tokenizer** | SentencePiece BPE, 8,192 vocab, Hebrew morphology-aware with prefix splitting |
32
+ | **License** | Apache 2.0 |
33
+ | **Language** | Hebrew (he) |
34
+
35
+ ## Architecture
36
+
37
+ HebrewGPT-1B-Instruct uses the same hybrid architecture as the base model:
38
+
39
+ - **Width:** 1024, **Depth:** 8 layers, **Heads:** 8 (head_dim=128)
40
+ - **Interleaved blocks:** Alternating RoPE multi-head attention and Mamba SSM layers
41
+ - **MLP:** SwiGLU activation
42
+ - **Positional encoding:** Rotary Position Embeddings (RoPE)
43
+
44
+ ## Training
45
+
46
+ ### SFT Configuration
47
+ - **Method:** Full Supervised Fine-Tuning (SFT)
48
+ - **Training steps:** 3,000
49
+ - **Best validation loss:** 2.9598
50
+ - **Hardware:** Single NVIDIA A10G GPU (AWS g5.2xlarge)
51
+ - **Training time:** ~6.5 hours
52
+ - **Total training tokens:** ~20.3M
53
+
54
+ ### Instruction Dataset (61K examples)
55
+
56
+ The model was fine-tuned on a balanced mix of Hebrew instruction-following tasks:
57
+
58
+ | Category | Examples | Description |
59
+ |----------|----------|-------------|
60
+ | QA (HeQ) | 15,000 | Hebrew question answering |
61
+ | Sentiment | 10,000 | Hebrew sentiment analysis |
62
+ | NLI | 2,938 | Natural language inference |
63
+ | Summarization (HeSum) | 10,000 | Hebrew text summarization |
64
+ | Translation | 15,000 | Hebrew-English translation |
65
+ | Alpaca | 5,000 | General instruction following (translated) |
66
+ | Dolly | 2,000 | Open-domain instruction following |
67
+ | Chat | 1,000 | Conversational Hebrew |
68
+ | Winograd | 278 | Coreference resolution |
69
+
70
+ ## Usage
71
+
72
+ ```python
73
+ import torch
74
+ import sentencepiece as spm
75
+
76
+ # Load tokenizer
77
+ sp = spm.SentencePieceProcessor()
78
+ sp.Load("tokenizer.model")
79
+
80
+ # Load model weights
81
+ state_dict = torch.load("model.pt", map_location="cpu")
82
+ # Initialize model architecture (see HebrewGPT-1B for model class definition)
83
+ # model.load_state_dict(state_dict)
84
+ ```
85
+
86
+ ### Prompt Format
87
+
88
+ The model was trained with a structured instruction format:
89
+
90
+ ```
91
+ ### הוראה:
92
+ {instruction}
93
+
94
+ ### קלט:
95
+ {input}
96
+
97
+ ### תשובה:
98
+ {response}
99
+ ```
100
+
101
+ ## Evaluation
102
+
103
+ Evaluation results coming soon. Base model (HebrewGPT-1B) benchmarks:
104
+
105
+ | Task | Base Model |
106
+ |------|-----------|
107
+ | SNLI | 50% |
108
+ | Sentiment | 33% |
109
+ | QA | 20% |
110
+ | Trivia | 13% |
111
+ | **Average** | **29.2%** |
112
+
113
+ ## Infrastructure
114
+
115
+ - **Research Orchestration:** Amazon Bedrock (Claude) via OpenClaw
116
+ - **Training Compute:** AWS EC2 g5.2xlarge (NVIDIA A10G)
117
+ - **Data Pipeline:** Automated dataset collection, translation, and balancing
118
+
119
+ ## Files
120
+
121
+ - `model.pt` — SFT fine-tuned model state dict (2.1 GB)
122
+ - `tokenizer.model` — SentencePiece BPE tokenizer (8,192 vocab)
123
+
124
+ ## Citation
125
+
126
+ ```bibtex
127
+ @misc{hebrewgpt1b-instruct-2026,
128
+ title={HebrewGPT-1B-Instruct: A Hebrew Instruction-Tuned Language Model},
129
+ author={Slasky, Ronnen},
130
+ year={2026},
131
+ url={https://huggingface.co/Slasky/HebrewGPT-1B-Instruct}
132
+ }
133
+ ```
134
+
135
+ ## Limitations
136
+
137
+ - Small vocabulary (8,192 tokens) may limit performance on rare words
138
+ - 2,048 context window limits long-document tasks
139
+ - Trained primarily on structured instruction tasks; open-ended generation quality may vary
140
+ - Hebrew-specific model — limited multilingual capability beyond Hebrew-English translation
141
+
142
+ ## License
143
+
144
+ Apache 2.0