return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
Package:
transformers
50617

Exception Class:
NotImplementedError
Raise code
rn_special_tokens_mask: bool = False,
return_offsets_mapping: bool = False,
return_length: bool = False,
verbose: bool = True,
**kwargs
) -> BatchEncoding:
if return_offsets_mapping:
raise NotImplementedError(
"return_offset_mapping is not available when using Python tokenizers."
"To use this feature, change your tokenizer to one deriving from "
"transformers.PreTrainedTokenizerFast."
)
if is_split_into_words:
raise NotImplementedError("is_split_into_words is not supported in this tokenizer.")
# i
Links to the raise (4)
https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/luke/tokenization_luke.py#L683 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/tapas/tokenization_tapas.py#L705 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/tapas/tokenization_tapas.py#L960 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/tokenization_utils.py#L530Ways to fix
Error code:
from transformers import LukeTokenizer
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
encoding = tokenizer.batch_encode_plus(batch_text_or_text_pairs=['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)
batch_encode_plus tokenize and prepare for the model a list of sequences or a list of pairs of sequences.
batch_text_or_text_pairs - Batch of sequences or pair of sequences to be encoded. This can be a list of strings or a list of pairs of strings.
return_offsets_mapping - Whether or not to return :obj:(char_start, char_end)
for each token.
As an error says,
NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
To fix an error need to use PreTrainedTokenizerFast.
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-uncased")
encoding = tokenizer.batch_encode_plus(['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)
Add a possible fix
Please authorize to post fix