Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
Package:
transformers
50617

Exception Class:
ValueError
Raise code
self[key] = tensor
except: # noqa E722
if key == "overflowing_tokens":
raise ValueError(
"Unable to create tensor returning overflowing tokens of different lengths. "
"Please see if a fast version of this tokenizer is available to have this feature available."
)
raise ValueError(
"Unable to create tensor, you should probably activate truncation and/or padding "
"with 'padding=True' 'truncation=True' to have batched tensors with the same length."
)
return self
@torch_required
def to(self,
🙏 Scream for help to Ukraine
Today, 2nd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this 😠, civilians and children are dying too!
We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Links to the raise (1)
https://github.com/huggingface/transformers/blob/bd9871657bb9500a9f4437a873db6df5f1ae6dbb/src/transformers/tokenization_utils_base.py#L715Ways to fix
Error code:
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=True) <--#Error here
print(tensor_batch)
Explanation:
tensor_type plays an important role here. Convert the lists of integers.There are can be 4 tensor type: PYTORCH = "pt" TENSORFLOW = "tf" NUMPY = "np" JAX = "jax".
prepend_batch_axis is whether or not to add a batch axis when converting to tensors.
More information here about BatchEncoding.
try:
if prepend_batch_axis:
value = [value]
if not is_tensor(value):
tensor = as_tensor(value)
# if tensor.ndim > 2:
# tensor = tensor.squeeze(0)
# elif tensor.ndim < 2:
# tensor = tensor[None, :]
self[key] = tensor
As you can see in the documentation, within the try block the dimension of value increases when the batch axis is true. And also, when we are using TensorFlow type it becomes TensorFlow.constant, that's why an error pops.
if tensor_type == TensorType.TENSORFLOW:
if not is_tf_available():
raise ImportError(
"Unable to convert output to TensorFlow tensors format, TensorFlow is not installed.")
import tensorflow as tf
as_tensor = tf.constant
is_tensor = tf.is_tensor
Fix code:
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="pt",prepend_batch_axis=True)
print(tensor_batch)
or
from transformers.tokenization_utils_base import BatchEncoding
batch = BatchEncoding({"inputs": [[1, 2, 3], [4, 5, 6]], "labels": [0, 1]})
tensor_batch = batch.convert_to_tensors(tensor_type="tf",prepend_batch_axis=False)
print(tensor_batch)
Add a possible fix
Please authorize to post fix