Buckets:

hf-doc-build/doc-dev / tokenizers /pr_2003 /en /api /encode-inputs.md
rtrm's picture
|
download
raw
2.08 kB

Encode Inputs

These types represent all the different kinds of input that a Tokenizer accepts when using encode_batch().

TextEncodeInput[[[[tokenizers.TextEncodeInput]]]]

tokenizers.TextEncodeInput

Represents a textual input for encoding. Can be either:

alias of Union[str, Tuple[str, str], List[str]].

PreTokenizedEncodeInput[[[[tokenizers.PreTokenizedEncodeInput]]]]

tokenizers.PreTokenizedEncodeInput

Represents a pre-tokenized input for encoding. Can be either:

alias of Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].

EncodeInput[[[[tokenizers.EncodeInput]]]]

tokenizers.EncodeInput

Represents all the possible types of input for encoding. Can be:

alias of Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]].

The Rust API Reference is available directly on the Docs.rs website.

The node API has not been documented yet.

Xet Storage Details

Size:
2.08 kB
·
Xet hash:
13ba2d5dd053f97dbf856bbd9cfb855b63bec7b08257ac987936cc53ce9dcffd

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.