| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | Anthropic's client side tokenizer. |
| |
|
| | Accuracy compared to actual Claude 3 Haiku tokenizer (Claude 3 family has the same tokenizer): |
| |
|
| | ```python |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: Hello, world! This is a simple... |
| | Actual tokens: 17 |
| | Predicted tokens: 10 |
| | Accuracy: 58.82% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: The quick brown fox jumps over... |
| | Actual tokens: 19 |
| | Predicted tokens: 10 |
| | Accuracy: 52.63% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: In computer programming, a hel... |
| | Actual tokens: 29 |
| | Predicted tokens: 21 |
| | Accuracy: 72.41% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: Artificial intelligence (AI) i... |
| | Actual tokens: 30 |
| | Predicted tokens: 24 |
| | Accuracy: 80.00% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: The Eiffel Tower is a wrought-... |
| | Actual tokens: 56 |
| | Predicted tokens: 48 |
| | Accuracy: 85.71% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: To be, or not to be, that is t... |
| | Actual tokens: 60 |
| | Predicted tokens: 50 |
| | Accuracy: 83.33% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: In the beginning God created t... |
| | Actual tokens: 38 |
| | Predicted tokens: 31 |
| | Accuracy: 81.58% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: Four score and seven years ago... |
| | Actual tokens: 41 |
| | Predicted tokens: 34 |
| | Accuracy: 82.93% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: I have a dream that one day th... |
| | Actual tokens: 51 |
| | Predicted tokens: 43 |
| | Accuracy: 84.31% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: That's one small step for man,... |
| | Actual tokens: 22 |
| | Predicted tokens: 14 |
| | Accuracy: 63.64% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: Here are the key points about ... |
| | Actual tokens: 203 |
| | Predicted tokens: 195 |
| | Accuracy: 96.06% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: This appears to be an excerpt ... |
| | Actual tokens: 179 |
| | Predicted tokens: 180 |
| | Accuracy: 99.44% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: This is the beginning of the b... |
| | Actual tokens: 194 |
| | Predicted tokens: 191 |
| | Accuracy: 98.45% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: That is the opening lines of t... |
| | Actual tokens: 177 |
| | Predicted tokens: 163 |
| | Accuracy: 92.09% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: That's a powerful and inspirin... |
| | Actual tokens: 193 |
| | Predicted tokens: 190 |
| | Accuracy: 98.45% |
| | -------------------------------------------------- |
| | Tokenization results saved to __temp.txt.tokens |
| | Text: That famous quote is from Neil... |
| | Actual tokens: 131 |
| | Predicted tokens: 122 |
| | Accuracy: 93.13% |
| | -------------------------------------------------- |
| | Average accuracy: 82.69% |
| | ``` |