ronnengmail commited on
Commit
fd08f10
·
verified ·
1 Parent(s): cec001b

Add post-training models section, downstream eval results, link to instruct variant

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -45,6 +45,14 @@ This model was developed as part of an autonomous AI research project exploring
45
  - 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
46
  - 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
47
 
 
 
 
 
 
 
 
 
48
  ## Model Description
49
 
50
  | Parameter | Value |
@@ -117,6 +125,16 @@ HebrewGPT uses a decoder-only transformer with several modern design choices:
117
  | Conversational | 29.79 |
118
  | Literature | 31.42 |
119
 
 
 
 
 
 
 
 
 
 
 
120
  ### Comparison with Other Hebrew Models
121
 
122
  | Model | Top-1 Accuracy | Top-5 Accuracy |
@@ -213,7 +231,7 @@ See [`generate.py`](generate.py) in this repository for a complete standalone sc
213
  ## Limitations
214
 
215
  - **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
216
- - **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment.
217
  - **Context length**: Limited to 2,048 tokens.
218
  - **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
219
  - **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.
 
45
  - 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
46
  - 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
47
 
48
+ ## Post-Training Models
49
+
50
+ | Model | Method | Perplexity | Instruction Following | Notes |
51
+ |-------|--------|-----------|----------------------|-------|
52
+ | **[HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct)** | LoRA Phase 2 (rank=64) | **15.78** (↓47%) | **97.3%** | Best instruct variant — 65K curriculum distillation, ~$12 training cost |
53
+
54
+ > 💡 The instruction-tuned variant achieves **PPL 15.78** (down from 29.75 base) with zero repetition and 97.3% instruction following, trained for just ~$12 on a single A10G.
55
+
56
  ## Model Description
57
 
58
  | Parameter | Value |
 
125
  | Conversational | 29.79 |
126
  | Literature | 31.42 |
127
 
128
+ ### Downstream Task Evaluation
129
+
130
+ | Task | Accuracy |
131
+ |------|----------|
132
+ | SNLI | 50% |
133
+ | Sentiment | 33% |
134
+ | QA | 20% |
135
+ | Trivia | 13% |
136
+ | **Average** | **29.2%** |
137
+
138
  ### Comparison with Other Hebrew Models
139
 
140
  | Model | Top-1 Accuracy | Top-5 Accuracy |
 
231
  ## Limitations
232
 
233
  - **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
234
+ - **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment. See [HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct) for the instruction-tuned variant.
235
  - **Context length**: Limited to 2,048 tokens.
236
  - **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
237
  - **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.