| This is the tokenizer used by the Sabiá-2 Medium model. | |
| Sabiá2 Medium is a proprietary LLM that can be used through an API endpoint, which we refer to as the "MariTalk API", or a downloadable version that can be used locally and is encrypted, known as "MariTalk Local". | |
| The purpose of including this tokenizer is to allow you to estimate the number of tokens in your prompts and, therefore, the cost of using the model. | |
| ```python | |
| import transformers | |
| tokenizer = transformers.AutoTokenizer.from_pretrained("maritaca-ai/sabia-2-tokenizer-medium") | |
| prompt = "Com quantos paus se faz uma canoa?" | |
| tokens = tokenizer.encode(prompt) | |
| print(f'O prompt "{prompt}" contém {len(tokens)} tokens.') # It should print 12 tokens. | |
| ``` | |
| For more information on how to use the model, please refer to our documentation at [this link](https://maritaca-ai.github.io/maritalk-api/maritalk.html). |