--- license: apache-2.0 --- Anthropic's client side tokenizer. Accuracy compared to actual Claude 3 Haiku tokenizer (Claude 3 family has the same tokenizer): ```python Tokenization results saved to __temp.txt.tokens Text: Hello, world! This is a simple... Actual tokens: 17 Predicted tokens: 10 Accuracy: 58.82% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: The quick brown fox jumps over... Actual tokens: 19 Predicted tokens: 10 Accuracy: 52.63% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: In computer programming, a hel... Actual tokens: 29 Predicted tokens: 21 Accuracy: 72.41% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: Artificial intelligence (AI) i... Actual tokens: 30 Predicted tokens: 24 Accuracy: 80.00% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: The Eiffel Tower is a wrought-... Actual tokens: 56 Predicted tokens: 48 Accuracy: 85.71% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: To be, or not to be, that is t... Actual tokens: 60 Predicted tokens: 50 Accuracy: 83.33% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: In the beginning God created t... Actual tokens: 38 Predicted tokens: 31 Accuracy: 81.58% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: Four score and seven years ago... Actual tokens: 41 Predicted tokens: 34 Accuracy: 82.93% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: I have a dream that one day th... Actual tokens: 51 Predicted tokens: 43 Accuracy: 84.31% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: That's one small step for man,... Actual tokens: 22 Predicted tokens: 14 Accuracy: 63.64% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: Here are the key points about ... Actual tokens: 203 Predicted tokens: 195 Accuracy: 96.06% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: This appears to be an excerpt ... Actual tokens: 179 Predicted tokens: 180 Accuracy: 99.44% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: This is the beginning of the b... Actual tokens: 194 Predicted tokens: 191 Accuracy: 98.45% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: That is the opening lines of t... Actual tokens: 177 Predicted tokens: 163 Accuracy: 92.09% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: That's a powerful and inspirin... Actual tokens: 193 Predicted tokens: 190 Accuracy: 98.45% -------------------------------------------------- Tokenization results saved to __temp.txt.tokens Text: That famous quote is from Neil... Actual tokens: 131 Predicted tokens: 122 Accuracy: 93.13% -------------------------------------------------- Average accuracy: 82.69% ```