Update README.md
Browse files
README.md
CHANGED
|
@@ -65,12 +65,6 @@ Nearly every base model that isn't finetuned for a specific task was trained on
|
|
| 65 |
|
| 66 |
```
|
| 67 |
|
| 68 |
-
"Instruct" models have these special tokens:
|
| 69 |
-
|
| 70 |
-
```
|
| 71 |
-
<prompt> your prompt goes here <output> the model outputs a result here.
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
|
| 75 |
|
| 76 |
I'd , uh , appreciate help in evaluating all these models probably with lm harness!!
|
|
|
|
| 65 |
|
| 66 |
```
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
|
| 69 |
|
| 70 |
I'd , uh , appreciate help in evaluating all these models probably with lm harness!!
|