Encode Inputs
译者:片刻小哥哥
项目地址:https://huggingface.apachecn.org/docs/tokenizers/api/encode-inputs
原始地址:https://huggingface.co/docs/tokenizers/api/encode-inputs
These types represent all the different kinds of input that a
Tokenizer
accepts
when using
encode_batch()
.
TextEncodeInput
tokenizers.TextEncodeInput
Represents a textual input for encoding. Can be either:
- A single sequence: TextInputSequence
- A pair of sequences:
- A Tuple of TextInputSequence
- Or a List of TextInputSequence of size 2
alias of
Union[str, Tuple[str, str], List[str]]
.
PreTokenizedEncodeInput
tokenizers.PreTokenizedEncodeInput
Represents a pre-tokenized input for encoding. Can be either:
- A single sequence: PreTokenizedInputSequence
- A pair of sequences:
- A Tuple of PreTokenizedInputSequence
- Or a List of PreTokenizedInputSequence of size 2
alias of
Union[List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]
.
EncodeInput
tokenizers.EncodeInput
Represents all the possible types of input for encoding. Can be:
- When
is_pretokenized=False
: TextEncodeInput - When
is_pretokenized=True
: PreTokenizedEncodeInput
alias of
Union[str, Tuple[str, str], List[str], Tuple[str], Tuple[Union[List[str], Tuple[str]], Union[List[str], Tuple[str]]], List[Union[List[str], Tuple[str]]]]
.