shisa-ai
/

shisa-v1-7b-v2.1

Text Generation

Model card Files Files and versions

leonardlin commited on 2 days ago

Commit

7dc0e2a

·

verified ·

1 Parent(s): 254853c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ datasets:
 # Shisa V1 7B V2.1
-This release is a bit of a meme model to celebrate the 2-year anniversary of the release of [Shisa 7B V1](https://huggingface.co/augmxnt/shisa-7b-v1), but I was genuinely curious to see how much the original [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)-based model could be improved with our latest V2.1 training. How much of our improvements are due to better post-training vs better base models?
 Beyond that curiousity, there is also *some* practical utility, as our [shisa-v1 tokenizer](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md) remains one of the most efficient tokenizers for Japanese text. (We've since abandoned tokenizer extension as the amount of continued-pre training required to recover performance and crucially, to resolve token leakage, are not a good trade-off for us.)

 # Shisa V1 7B V2.1
+This release is a bit of a meme model to celebrate the 2-year anniversary of the release of [Shisa 7B V1](https://huggingface.co/augmxnt/shisa-7b-v1), but I was genuinely curious to see how much the original [Mistral 7B v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)-based model could be improved with our latest [Shisa V2.1](https://huggingface.co/collections/shisa-ai/shisa-v21) training. How much of our improvements are due to better post-training vs better base models?
 Beyond that curiousity, there is also *some* practical utility, as our [shisa-v1 tokenizer](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md) remains one of the most efficient tokenizers for Japanese text. (We've since abandoned tokenizer extension as the amount of continued-pre training required to recover performance and crucially, to resolve token leakage, are not a good trade-off for us.)