Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -100,15 +100,17 @@ The model was trained with a structured instruction format:
|
|
| 100 |
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
-
Evaluation
|
| 104 |
-
|
| 105 |
-
| Task | Base Model |
|
| 106 |
-
|------|-----------|
|
| 107 |
-
| SNLI | 50% |
|
| 108 |
-
| Sentiment | 33% |
|
| 109 |
-
| QA | 20% |
|
| 110 |
-
| Trivia | 13% |
|
| 111 |
-
| **Average** | **29.2%** |
|
|
|
|
|
|
|
| 112 |
|
| 113 |
## Infrastructure
|
| 114 |
|
|
|
|
| 100 |
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
+
Evaluation on Hebrew benchmarks requires GPU inference. Base model (HebrewGPT-1B) results for comparison:
|
| 104 |
+
|
| 105 |
+
| Task | Base Model | Instruct (SFT) |
|
| 106 |
+
|------|-----------|----------------|
|
| 107 |
+
| SNLI | 50% | *Pending* |
|
| 108 |
+
| Sentiment | 33% | *Pending* |
|
| 109 |
+
| QA | 20% | *Pending* |
|
| 110 |
+
| Trivia | 13% | *Pending* |
|
| 111 |
+
| **Average** | **29.2%** | *Pending* |
|
| 112 |
+
|
| 113 |
+
SFT evaluation will be run on GPU and updated here. The instruction-tuned model is expected to show significant improvements on structured tasks (QA, sentiment, NLI) that were part of the SFT training mix.
|
| 114 |
|
| 115 |
## Infrastructure
|
| 116 |
|