Add post-training models section, downstream eval results, link to instruct variant

Browse files

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -45,6 +45,14 @@ This model was developed as part of an autonomous AI research project exploring
 - 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
 - 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
 ## Model Description
 | Parameter | Value |
@@ -117,6 +125,16 @@ HebrewGPT uses a decoder-only transformer with several modern design choices:
 | Conversational | 29.79 |
 | Literature | 31.42 |
 ### Comparison with Other Hebrew Models
 | Model | Top-1 Accuracy | Top-5 Accuracy |
@@ -213,7 +231,7 @@ See [`generate.py`](generate.py) in this repository for a complete standalone sc
 ## Limitations
 - **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
-- **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment.
 - **Context length**: Limited to 2,048 tokens.
 - **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
 - **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.

 - 🔬 **Ablation model**: [HebrewGPT-1B-AdamW](https://huggingface.co/Slasky/HebrewGPT-1B-AdamW) (AdamW baseline)
 - 🧪 **Smaller model**: [HebrewGPT-296M](https://huggingface.co/Slasky/HebrewGPT-296M) (296M parameter variant)
+## Post-Training Models
+| Model | Method | Perplexity | Instruction Following | Notes |
+|-------|--------|-----------|----------------------|-------|
+| **[HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct)** | LoRA Phase 2 (rank=64) | **15.78** (↓47%) | **97.3%** | Best instruct variant — 65K curriculum distillation, ~$12 training cost |
+> 💡 The instruction-tuned variant achieves **PPL 15.78** (down from 29.75 base) with zero repetition and 97.3% instruction following, trained for just ~$12 on a single A10G.
 ## Model Description
 | Parameter | Value |
 | Conversational | 29.79 |
 | Literature | 31.42 |
+### Downstream Task Evaluation
+| Task | Accuracy |
+|------|----------|
+| SNLI | 50% |
+| Sentiment | 33% |
+| QA | 20% |
+| Trivia | 13% |
+| **Average** | **29.2%** |
 ### Comparison with Other Hebrew Models
 | Model | Top-1 Accuracy | Top-5 Accuracy |
 ## Limitations
 - **Hebrew-only**: The model was trained exclusively on Hebrew text. It has limited ability to handle other languages.
+- **No instruction tuning**: This is a base language model. It has not been fine-tuned for chat, instruction following, or safety alignment. See [HebrewGPT-1B-Instruct](https://huggingface.co/Slasky/HebrewGPT-1B-Instruct) for the instruction-tuned variant.
 - **Context length**: Limited to 2,048 tokens.
 - **Training data biases**: The model reflects biases present in its training data, which includes legal documents, literature, and web text.
 - **Custom architecture**: Requires the provided model class to load — not compatible with standard `AutoModelForCausalLM`.