Update README.md
Browse files
README.md
CHANGED
|
@@ -87,4 +87,9 @@ Keeping this in mind:
|
|
| 87 |
- The exact answer is always important and is always a few tokens. Hence, we do not mask the labels or input tokens for the answer value.
|
| 88 |
- Rarely, we ignore the rationale labels entirely, such that the model is only pushed to learn what leads to the best answer.
|
| 89 |
|
| 90 |
-
## Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
- The exact answer is always important and is always a few tokens. Hence, we do not mask the labels or input tokens for the answer value.
|
| 88 |
- Rarely, we ignore the rationale labels entirely, such that the model is only pushed to learn what leads to the best answer.
|
| 89 |
|
| 90 |
+
## Results
|
| 91 |
+
|
| 92 |
+
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
| 93 |
+
|
| 94 |
+
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.
|
| 95 |
+
|