Buckets:
Input Sequences
These types represent all the different kinds of sequence that can be used as input of a Tokenizer.
Globally, any sequence can be either a string or a list of strings, according to the operating
mode of the tokenizer: raw text vs pre-tokenized.
TextInputSequence[[tokenizers.TextInputSequence]]
tokenizers.TextInputSequence
A str that represents an input sequence
PreTokenizedInputSequence[[tokenizers.PreTokenizedInputSequence]]
tokenizers.PreTokenizedInputSequence
A pre-tokenized input sequence. Can be one of:
- A
Listofstr - A
Tupleofstr
alias of Union[List[str], Tuple[str]].
InputSequence[[tokenizers.InputSequence]]
tokenizers.InputSequence
Represents all the possible types of input sequences for encoding. Can be:
- When
is_pretokenized=False: TextInputSequence - When
is_pretokenized=True: PreTokenizedInputSequence
alias of Union[str, List[str], Tuple[str]].
The Rust API Reference is available directly on the Docs.rs website.
The node API has not been documented yet.
Xet Storage Details
- Size:
- 1.17 kB
- Xet hash:
- 2fa78d4f88c235b664d2f93188c79bba96c05b6e706f9d81e52c9b9a50f44e6a
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.