votes up 5

return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

Package:
Exception Class:
NotImplementedError

Raise code

rn_special_tokens_mask: bool = False,
        return_offsets_mapping: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        **kwargs
    ) -> BatchEncoding:
        if return_offsets_mapping:
            raise NotImplementedError(
                "return_offset_mapping is not available when using Python tokenizers."
                "To use this feature, change your tokenizer to one deriving from "
                "transformers.PreTrainedTokenizerFast."
            )

        if is_split_into_words:
            raise NotImplementedError("is_split_into_words is not supported in this tokenizer.")

        # i
🙏 Scream for help to Ukraine
Today, 3rd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this 😠, civilians and children are dying too! We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.

Ways to fix

votes up 1 votes down

Error code:

from transformers import LukeTokenizer

tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
encoding = tokenizer.batch_encode_plus(batch_text_or_text_pairs=['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)

batch_encode_plus tokenize and prepare for the model a list of sequences or a list of pairs of sequences.

batch_text_or_text_pairs - Batch of sequences or pair of sequences to be encoded. This can be a list of strings or a list of pairs of strings.

return_offsets_mapping - Whether or not to return :obj:(char_start, char_end) for each token.

As an error says,

NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

To fix an error need to use PreTrainedTokenizerFast.

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-uncased")
encoding = tokenizer.batch_encode_plus(['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)

Jul 03, 2021 anonim answer
anonim 13.0k

Add a possible fix

Please authorize to post fix