votes up 5

word_ids() is not available when using Python-based tokenizers

Package:
Exception Class:
ValueError

Raise code

""" 

        Returns:
            :obj:`List[Optional[int]]`: A list indicating the word corresponding to each token. Special tokens added by
            the tokenizer are mapped to :obj:`None` and other tokens are mapped to the index of their corresponding
            word (several tokens will be mapped to the same word index if they are parts of that word).
        """
        if not self._encodings:
            raise ValueError("word_ids() is not available when using Python-based tokenizers")
        return self._encodings[batch_index].word_ids

    def token_to_sequence(self, batch_or_token_index: int, token_index: Optional[int] = None) -> int:
        """
        Get the index of the sequence represented by the given token. In the general use case, this method returns
        :obj:`0` for a single sequence or the first sequence of a pair, and :obj:`1` for the second sequence of a pair
 """
😲 Agile task management is now easier than calling a taxi. #Tracklify

Ways to fix

votes up 2 votes down

Summary:

This exception occurs when the word_ids function is called on an instance of the BatchEncoding object. When creating an instance of the BatchEncoding object, there is an optional parameter: encoding. In order to avoid this exception, you must pass in a value for that parameter, as well as ensuring a dictionary was passed as the first parameter. The value of encoding must be an Encoding object in order for the program to run smoothly. The Encoding class is from tokenizers.

Code to Reproduce the Error (WRONG):

import transformers.tokenization_utils_base as tub
from tokenizers import Encoding

be = tub.BatchEncoding({'a':[1,2,3]})
be.word_ids()

Working Version (Fixed):

import transformers.tokenization_utils_base as tub
from tokenizers import Encoding

en = Encoding()
be = tub.BatchEncoding({'a':[1,2,3]}, en)
be.word_ids()
Jul 09, 2021 codingcomedyig answer

Add a possible fix

Please authorize to post fix