Update README.md
Browse files
README.md
CHANGED
|
@@ -33,8 +33,6 @@ In particular, it might be easy to predict a *reasonable* next token, but much m
|
|
| 33 |
The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
|
| 34 |
As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
|
| 35 |
|
| 36 |
-
---
|
| 37 |
-
|
| 38 |
SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
|
| 39 |
SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
|
| 40 |
In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.
|
|
|
|
| 33 |
The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
|
| 34 |
As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
|
| 35 |
|
|
|
|
|
|
|
| 36 |
SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
|
| 37 |
SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
|
| 38 |
In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.
|