votes up 5

return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

Package:
Exception Class:
NotImplementedError

Raise code

rn_special_tokens_mask: bool = False,
        return_offsets_mapping: bool = False,
        return_length: bool = False,
        verbose: bool = True,
        **kwargs
    ) -> BatchEncoding:
        if return_offsets_mapping:
            raise NotImplementedError(
                "return_offset_mapping is not available when using Python tokenizers."
                "To use this feature, change your tokenizer to one deriving from "
                "transformers.PreTrainedTokenizerFast."
            )

        if is_split_into_words:
            raise NotImplementedError("is_split_into_words is not supported in this tokenizer.")

        # i
😲  Walkingbet is Android app that pays you real bitcoins for a walking. Withdrawable real money bonus is available now, hurry up! 🚶

Ways to fix

votes up 1 votes down

Error code:

from transformers import LukeTokenizer

tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-base")
encoding = tokenizer.batch_encode_plus(batch_text_or_text_pairs=['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)

batch_encode_plus tokenize and prepare for the model a list of sequences or a list of pairs of sequences.

batch_text_or_text_pairs - Batch of sequences or pair of sequences to be encoded. This can be a list of strings or a list of pairs of strings.

return_offsets_mapping - Whether or not to return :obj:(char_start, char_end) for each token.

As an error says,

NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

To fix an error need to use PreTrainedTokenizerFast.

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-uncased")
encoding = tokenizer.batch_encode_plus(['HEs','she','sdg'],return_offsets_mapping=True)
print(encoding)

Jul 03, 2021 anonim answer
anonim 13.0k

Add a possible fix

Please authorize to post fix