Fixing the MD
#1
by FlameF0X - opened
README.md
CHANGED
|
@@ -27,7 +27,7 @@ tags:
|
|
| 27 |
|
| 28 |
TMLM-Haiku-2 is a tiny autoregressive language model with approximately one million parameters. That is not a typo. In an era where models are measured in billions, we went the other direction, mostly because we could.
|
| 29 |
|
| 30 |
-
It was trained on English text at a ratio of 100 tokens per parameter. The math is simple: 1M parameters × 100 tokens = ~100M total tokens. We split this budget deliberately: roughly two-thirds (~66.7M tokens) went into general pretraining, after which the resulting checkpoint was fine-tuned on instruction data using the remaining one-third (~33.3M tokens).
|
| 31 |
|
| 32 |
This approach lets us squeeze more signal out of every parameter. It does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
|
| 33 |
|
|
|
|
| 27 |
|
| 28 |
TMLM-Haiku-2 is a tiny autoregressive language model with approximately one million parameters. That is not a typo. In an era where models are measured in billions, we went the other direction, mostly because we could.
|
| 29 |
|
| 30 |
+
It was trained on English text at a ratio of 100 tokens per parameter. The math is simple: 1M parameters × 100 tokens = \~100M total tokens. We split this budget deliberately: roughly two-thirds (\~66.7M tokens) went into general pretraining, after which the resulting checkpoint was fine-tuned on instruction data using the remaining one-third (~33.3M tokens).
|
| 31 |
|
| 32 |
This approach lets us squeeze more signal out of every parameter. It does not make the model smart. It just makes it slightly less confused than it would have been otherwise.
|
| 33 |
|