votes up 5

Can only automatically infer lengths for datasets whose items are dictionaries with an '(self.model_input_name)' key.

Package:
Exception Class:
ValueError

Raise code

ch_size = batch_size
        self.model_input_name = model_input_name if model_input_name is not None else "input_ids"
        if lengths is None:
            if (
                not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
                or self.model_input_name not in dataset[0]
            ):
                raise ValueError(
                    "Can only automatically infer lengths for datasets whose items are dictionaries with an "
                    f"'{self.model_input_name}' key."
                )
            lengths = [len(feature[self.model_input_name]) for feature in dataset]
        self.lengths = lengths
        self.generator = generator

    def __len__(
ūüôŹ Scream for help to Ukraine
Today, 2nd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this ūüė†, civilians and children are dying too! We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.

Ways to fix

votes up 1 votes down

LengthGroupedSampler is a sample that indices in a way that groups together features of the dataset of roughly the same length while keeping a bit of randomness.

Error code:

import torch
from transformers.trainer_pt_utils import LengthGroupedSampler

lengths = torch.randint(0, 25, (100,)).tolist()
# Put one bigger than the others to check it ends up in first position
lengths[32] = 50

indices = list(LengthGroupedSampler(lengths, 4))  #<--- didn't define our lenght
print(indices)

Because of length is None and input_name is not in the dataset (in our code lengths), an error is coming.

if lengths is None:
  if (
      not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
      or self.model_input_name not in dataset[0]
  ):
      raise ValueError(
          "Can only automatically infer lengths for datasets whose items are dictionaries with an "
          f"'{self.model_input_name}' key."
      )

Fix code:

import torch
from transformers.trainer_pt_utils import LengthGroupedSampler

lengths = torch.randint(0, 25, (100,)).tolist()
# Put one bigger than the others to check it ends up in first position
lengths[32] = 50

indices = list(LengthGroupedSampler(lengths, 4,lengths=lengths)) #<--- Added lengths
print(indices)
Jul 09, 2021 anonim answer
anonim 13.0k

Add a possible fix

Please authorize to post fix