votes up 9

`ngrams` must be None, an integer, or a tuple of integers. Got %s

Package:
keras
github stars 52268
Exception Class:
ValueError

Raise code

  allow_none=True)

    # 'ngrams' must be one of (None, int, tuple(int))
    if not (ngrams is None or
            isinstance(ngrams, int) or
            isinstance(ngrams, tuple) and
            all(isinstance(item, int) for item in ngrams)):
      raise ValueError(("`ngrams` must be None, an integer, or a tuple of "
                        "integers. Got %s") % (ngrams,))

    # 'output_sequence_length' must be one of (None, int) and is only
    # set if output_mode is INT.
    if (output_mode == INT and not (isinstance(output_sequence_length, int) or
                                    (output_sequence_length is None))):
      raise ValueError("`output_sequence_length` must be either None or an "
      

Ways to fix

votes up 1 votes down

A float value was given to the parameter ngrams of TextVectorization class. An integer type should be given instead to fix this error:

Reproducing the error:

pipenv install tensorflow

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
vocab_data = ["earth", "wind", "and", "fire"]
max_len = 4
max_features = 5000
vectorize_layer = TextVectorization(max_tokens=max_features,
                                    output_mode='int',
                                    output_sequence_length=max_len,
                                    vocabulary=vocab_data,
                                    ngrams=3.)
print(vectorize_layer.get_vocabulary())

The error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-6e4a94518f79> in <module>()
      7                                     output_sequence_length=max_len,
      8                                     vocabulary=vocab_data,
----> 9                                     ngrams=3.)
     10 print(vectorize_layer.get_vocabulary())

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/preprocessing/text_vectorization.py in __init__(self, max_tokens, standardize, split, ngrams, output_mode, output_sequence_length, pad_to_max_tokens, vocabulary, **kwargs)
    291             all(isinstance(item, int) for item in ngrams)):
    292       raise ValueError(("`ngrams` must be None, an integer, or a tuple of "
--> 293                         "integers. Got %s") % (ngrams,))
    294 
    295     # 'output_sequence_length' must be one of (None, int) and is only

ValueError: `ngrams` must be None, an integer, or a tuple of integers. Got 3.0


Fixed version of the code:

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
vocab_data = ["earth", "wind", "and", "fire"]
max_len = 4
max_features = 5000
vectorize_layer = TextVectorization(max_tokens=max_features,
                                    output_mode='int',
                                    output_sequence_length=max_len,
                                    vocabulary=vocab_data,
                                    ngrams=3)
print(vectorize_layer.get_vocabulary())

Output:

['', '[UNK]', 'earth', 'wind', 'and', 'fire']

Jul 15, 2021 kellemnegasi answer
kellemnegasi 22.6k

Add a possible fix

Please authorize to post fix