Update README.md
Browse files
README.md
CHANGED
|
@@ -91,5 +91,12 @@ Keeping this in mind:
|
|
| 91 |
|
| 92 |
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
| 93 |
|
| 94 |
-
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
| 93 |
|
| 94 |
+
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking for CoT), and once with SFT.
|
| 95 |
|
| 96 |
+
Here are some benchmark results, computed using the the LM Evaluation Harness with vllm:
|
| 97 |
+
|
| 98 |
+
| Model | GSM8K (strict, 5-shot) | AGIEval (Nous subset, 0-shot) | ARC-C | BBH
|
| 99 |
+
|:--------------:|-----------------------:|------------------------------:|------:|-----
|
| 100 |
+
| SFT | 23.81% |
|
| 101 |
+
| Masked Thought | 20.24% | 23.80%
|
| 102 |
+
| **ReMask** | **24.03%** | 24.71%
|