NovelAI
/

nerdstash-tokenizer-v1

Model card Files Files and versions

finetune commited on Aug 2, 2023

Commit

a774922

·

1 Parent(s): c23941d

Add meta data

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -1,3 +1,12 @@
 # Tokenizer
 Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence “`The quick brown fox jumps over the goblin.`” would tokenize as “`The| quick| brown| fox| jumps| over| the| go|bl|in.`” in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.
@@ -64,4 +73,4 @@ print("Readable tokens:", s.encode(text, out_type=str))
 ## License
-The tokenizer is licensed under the GNU General Public License, version 2.

+---
+license: gpl-2.0
+language:
+- en
+- ja
+tags:
+- tokenizer
+- novelai
+---
 # Tokenizer
 Finetune here to talk a bit about [NovelAI](https://novelai.net/)'s new tokenizer that I worked on. First a quick reminder. In most cases, our models don't see words as individual letters. Instead, text is broken down into tokens, which are words or word fragments. For example, the sentence “`The quick brown fox jumps over the goblin.`” would tokenize as “`The| quick| brown| fox| jumps| over| the| go|bl|in.`” in the Pile tokenizer used by GPT-NeoX 20B and Krake, with each | signifying a boundary between tokens.
 ## License
+The tokenizer is licensed under the GNU General Public License, version 2.