return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
Package:
transformers
50617

Exception Class:
NotImplementedError
Raise code
rn_special_tokens_mask: bool = False,
return_offsets_mapping: bool = False,
return_length: bool = False,
verbose: bool = True,
**kwargs
) -> BatchEncoding:
if return_offsets_mapping:
raise NotImplementedError(
"return_offset_mapping is not available when using Python tokenizers."
"To use this feature, change your tokenizer to one deriving from "
"transformers.PreTrainedTokenizerFast."
)
if is_split_into_words:
raise NotImplementedError("is_split_into_words is not supported in this tokenizer.")
# i
🙏 Scream for help to Ukraine
Today, 3rd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this 😠, civilians and children are dying too!
We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Links to the raise (4)
https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/luke/tokenization_luke.py#L683 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/tapas/tokenization_tapas.py#L705 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/models/tapas/tokenization_tapas.py#L960 https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/tokenization_utils.py#L530Ways to fix
Error code:
from transformers import LukeTokenizer
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
encoding = tokenizer.batch_encode_plus(batch_text_or_text_pairs=['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)
batch_encode_plus tokenize and prepare for the model a list of sequences or a list of pairs of sequences.
batch_text_or_text_pairs - Batch of sequences or pair of sequences to be encoded. This can be a list of strings or a list of pairs of strings.
return_offsets_mapping - Whether or not to return :obj:(char_start, char_end)
for each token.
As an error says,
NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
To fix an error need to use PreTrainedTokenizerFast.
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-uncased")
encoding = tokenizer.batch_encode_plus(['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)
Add a possible fix
Please authorize to post fix