Add post-training models section, downstream eval results, link to instruct variant
Browse files
README.md
CHANGED
|
@@ -45,6 +45,14 @@ This model was developed as part of an autonomous AI research project exploring
|
|
| 45 |
- 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
|
| 46 |
- 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## Model Description
|
| 49 |
|
| 50 |
| Parameter | Value |
|
|
@@ -117,6 +125,16 @@ HebrewGPT uses a decoder-only transformer with several modern design choices:
|
|
| 117 |
| Conversational | 29.79 |
|
| 118 |
| Literature | 31.42 |
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
### Comparison with Other Hebrew Models
|
| 121 |
|
| 122 |
| Model | Top-1 Accuracy | Top-5 Accuracy |
|
|
@@ -213,7 +231,7 @@ See [`generate.py`](generate.py) in this repository for a complete standalone sc
|
|
| 213 |
## Limitations
|
| 214 |
|
| 215 |
- **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
|
| 216 |
-
- **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment.
|
| 217 |
- **Context length**: Limited to 2,048 tokens.
|
| 218 |
- **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
|
| 219 |
- **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.
|
|
|
|
| 45 |
- 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
|
| 46 |
- 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
|
| 47 |
|
| 48 |
+
## Post-Training Models
|
| 49 |
+
|
| 50 |
+
| Model | Method | Perplexity | Instruction Following | Notes |
|
| 51 |
+
|-------|--------|-----------|----------------------|-------|
|
| 52 |
+
| **[HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct)** | LoRA Phase 2 (rank=64) | **15.78** (↓47%) | **97.3%** | Best instruct variant — 65K curriculum distillation, ~$12 training cost |
|
| 53 |
+
|
| 54 |
+
> 💡 The instruction-tuned variant achieves **PPL 15.78** (down from 29.75 base) with zero repetition and 97.3% instruction following, trained for just ~$12 on a single A10G.
|
| 55 |
+
|
| 56 |
## Model Description
|
| 57 |
|
| 58 |
| Parameter | Value |
|
|
|
|
| 125 |
| Conversational | 29.79 |
|
| 126 |
| Literature | 31.42 |
|
| 127 |
|
| 128 |
+
### Downstream Task Evaluation
|
| 129 |
+
|
| 130 |
+
| Task | Accuracy |
|
| 131 |
+
|------|----------|
|
| 132 |
+
| SNLI | 50% |
|
| 133 |
+
| Sentiment | 33% |
|
| 134 |
+
| QA | 20% |
|
| 135 |
+
| Trivia | 13% |
|
| 136 |
+
| **Average** | **29.2%** |
|
| 137 |
+
|
| 138 |
### Comparison with Other Hebrew Models
|
| 139 |
|
| 140 |
| Model | Top-1 Accuracy | Top-5 Accuracy |
|
|
|
|
| 231 |
## Limitations
|
| 232 |
|
| 233 |
- **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
|
| 234 |
+
- **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment. See [HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct) for the instruction-tuned variant.
|
| 235 |
- **Context length**: Limited to 2,048 tokens.
|
| 236 |
- **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
|
| 237 |
- **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.
|