Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ datasets:
|
|
| 18 |
|
| 19 |
# Shisa V1 7B V2.1
|
| 20 |
|
| 21 |
-
This release is a bit of a meme model to celebrate the 2-year anniversary of the release of [Shisa 7B V1](https://huggingface.co/augmxnt/shisa-7b-v1), but I was genuinely curious to see how much the original [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)-based model could be improved with our latest V2.1 training. How much of our improvements are due to better post-training vs better base models?
|
| 22 |
|
| 23 |
Beyond that curiousity, there is also *some* practical utility, as our [shisa-v1 tokenizer](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md) remains one of the most efficient tokenizers for Japanese text. (We've since abandoned tokenizer extension as the amount of continued-pre training required to recover performance and crucially, to resolve token leakage, are not a good trade-off for us.)
|
| 24 |
|
|
|
|
| 18 |
|
| 19 |
# Shisa V1 7B V2.1
|
| 20 |
|
| 21 |
+
This release is a bit of a meme model to celebrate the 2-year anniversary of the release of [Shisa 7B V1](https://huggingface.co/augmxnt/shisa-7b-v1), but I was genuinely curious to see how much the original [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)-based model could be improved with our latest [Shisa V2.1](https://huggingface.co/collections/shisa-ai/shisa-v21) training. How much of our improvements are due to better post-training vs better base models?
|
| 22 |
|
| 23 |
Beyond that curiousity, there is also *some* practical utility, as our [shisa-v1 tokenizer](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md) remains one of the most efficient tokenizers for Japanese text. (We've since abandoned tokenizer extension as the amount of continued-pre training required to recover performance and crucially, to resolve token leakage, are not a good trade-off for us.)
|
| 24 |
|