votes up 4

Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

Package:
Exception Class:
ValueError

Raise code

    self[key] = tensor
            except:  # noqa E722
                if key == "overflowing_tokens":
                    raise ValueError(
                        "Unable to create tensor returning overflowing tokens of different lengths. "
                        "Please see if a fast version of this tokenizer is available to have this feature available."
                    )
                raise ValueError(
                    "Unable to create tensor, you should probably activate truncation and/or padding "
                    "with 'padding=True' 'truncation=True' to have batched tensors with the same length."
                )

        return self

    @torch_required
    def to(self,

Ways to fix

votes up 3 votes down

Error code:

from transformers.tokenization_utils_base import BatchEncoding

batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=True) <--#Error here
print(tensor_batch)

Explanation:

tensor_type plays an important role here. Convert the lists of integers.There are can be 4 tensor type: PYTORCH = "pt" TENSORFLOW = "tf" NUMPY = "np" JAX = "jax".

prepend_batch_axis is whether or not to add a batch axis when converting to tensors.

More information here about BatchEncoding.

try:
    if prepend_batch_axis:
        value = [value]
    if not is_tensor(value):
        tensor = as_tensor(value)
        # if tensor.ndim > 2:
        #     tensor = tensor.squeeze(0)
        # elif tensor.ndim < 2:
        #     tensor = tensor[None, :]
        self[key] = tensor

As you can see in the documentation, within the try block the dimension of value increases when the batch axis is true. And also, when we are using TensorFlow type it becomes TensorFlow.constant, that's why an error pops.

if tensor_type == TensorType.TENSORFLOW:
    if not is_tf_available():
        raise ImportError(
            "Unable to convert output to TensorFlow tensors format, TensorFlow is not installed.")
    import tensorflow as tf
    as_tensor = tf.constant
    is_tensor = tf.is_tensor

Fix code:

from transformers.tokenization_utils_base import BatchEncoding

batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="pt",prepend_batch_axis=True)
print(tensor_batch)

or

from transformers.tokenization_utils_base import BatchEncoding

batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=False)
print(tensor_batch)
Jun 22, 2021 anonim answer
anonim 13.0k

Add a possible fix

Please authorize to post fix