Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
Package:
transformers
50617

Exception Class:
ValueError
Raise code
self[key] = tensor
except: # noqa E722
if key == "overflowing_tokens":
raise ValueError(
"Unable to create tensor returning overflowing tokens of different lengths. "
"Please see if a fast version of this tokenizer is available to have this feature available."
)
raise ValueError(
"Unable to create tensor, you should probably activate truncation and/or padding "
"with 'padding=True' 'truncation=True' to have batched tensors with the same length."
)
return self
@torch_required
def to(self,
Links to the raise (1)
https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/tokenization_utils_base.py#L715Ways to fix
Error code:
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=True) <--#Error here
print(tensor_batch)
Explanation:
tensor_type plays an important role here. Convert the lists of integers.There are can be 4 tensor type: PYTORCH = "pt" TENSORFLOW = "tf" NUMPY = "np" JAX = "jax".
prepend_batch_axis is whether or not to add a batch axis when converting to tensors.
More information here about BatchEncoding.
try:
if prepend_batch_axis:
value = [value]
if not is_tensor(value):
tensor = as_tensor(value)
# if tensor.ndim > 2:
# tensor = tensor.squeeze(0)
# elif tensor.ndim < 2:
# tensor = tensor[None, :]
self[key] = tensor
As you can see in the documentation, within the try block the dimension of value increases when the batch axis is true. And also, when we are using TensorFlow type it becomes TensorFlow.constant, that's why an error pops.
if tensor_type == TensorType.TENSORFLOW:
if not is_tf_available():
raise ImportError(
"Unable to convert output to TensorFlow tensors format, TensorFlow is not installed.")
import tensorflow as tf
as_tensor = tf.constant
is_tensor = tf.is_tensor
Fix code:
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="pt",prepend_batch_axis=True)
print(tensor_batch)
or
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=False)
print(tensor_batch)
Add a possible fix
Please authorize to post fix