Buckets:
| # Encode Inputs | |
| These types represent all the different kinds of input that a [Tokenizer](/docs/tokenizers/pr_2003/en/api/tokenizer#tokenizers.Tokenizer) accepts | |
| when using `encode_batch()`. | |
| ## TextEncodeInput[[[[tokenizers.TextEncodeInput]]]] | |
| tokenizers.TextEncodeInput | |
| Represents a textual input for encoding. Can be either: | |
| - A single sequence: [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence) | |
| - A pair of sequences: | |
| - A Tuple of [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence) | |
| - Or a List of [TextInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.TextInputSequence) of size 2 | |
| alias of `Union[str, Tuple[str, str], List[str]]`. | |
| ## PreTokenizedEncodeInput[[[[tokenizers.PreTokenizedEncodeInput]]]] | |
| tokenizers.PreTokenizedEncodeInput | |
| Represents a pre-tokenized input for encoding. Can be either: | |
| - A single sequence: [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence) | |
| - A pair of sequences: | |
| - A Tuple of [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence) | |
| - Or a List of [PreTokenizedInputSequence](/docs/tokenizers/api/input-sequences#tokenizers.PreTokenizedInputSequence) of size 2 | |
| alias of `Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]`. | |
| ## EncodeInput[[[[tokenizers.EncodeInput]]]] | |
| tokenizers.EncodeInput | |
| Represents all the possible types of input for encoding. Can be: | |
| - When `is_pretokenized=False`: [TextEncodeInput](#tokenizers.TextEncodeInput) | |
| - When `is_pretokenized=True`: [PreTokenizedEncodeInput](#tokenizers.PreTokenizedEncodeInput) | |
| alias of `Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]`. | |
| The Rust API Reference is available directly on the [Docs.rs](https://docs.rs/tokenizers/latest/tokenizers/) website. | |
| The node API has not been documented yet. | |
Xet Storage Details
- Size:
- 2.08 kB
- Xet hash:
- 13ba2d5dd053f97dbf856bbd9cfb855b63bec7b08257ac987936cc53ce9dcffd
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.