votes up 5

Can only automatically infer lengths for datasets whose items are dictionaries with an '(self.model_input_name)' key.

Package:
Exception Class:
ValueError

Raise code

ch_size = batch_size
        self.model_input_name = model_input_name if model_input_name is not None else "input_ids"
        if lengths is None:
            if (
                not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
                or self.model_input_name not in dataset[0]
            ):
                raise ValueError(
                    "Can only automatically infer lengths for datasets whose items are dictionaries with an "
                    f"'{self.model_input_name}' key."
                )
            lengths = [len(feature[self.model_input_name]) for feature in dataset]
        self.lengths = lengths
        self.generator = generator

    def __len__(
😲  Walkingbet is Android app that pays you real bitcoins for a walking. Withdrawable real money bonus is available now, hurry up! 🚶

Ways to fix

votes up 1 votes down

LengthGroupedSampler is a sample that indices in a way that groups together features of the dataset of roughly the same length while keeping a bit of randomness.

Error code:

import torch
from transformers.trainer_pt_utils import LengthGroupedSampler

lengths = torch.randint(025, (100,)).tolist()
# Put one bigger than the others to check it ends up in first position
lengths[32] = 50

indices = list(LengthGroupedSampler(lengths, 4))  #<--- didn't define our lenght
print(indices)

Because of length is None and input_name is not in the dataset (in our code lengths), an error is coming.

if lengths is None:
  if (
      not (isinstance(dataset[0], dict) or isinstance(dataset[0], BatchEncoding))
      or self.model_input_name not in dataset[0]
  ):
      raise ValueError(
          "Can only automatically infer lengths for datasets whose items are dictionaries with an "
          f"'{self.model_input_name}' key."
      )

Fix code:

import torch
from transformers.trainer_pt_utils import LengthGroupedSampler

lengths = torch.randint(025, (100,)).tolist()
# Put one bigger than the others to check it ends up in first position
lengths[32] = 50

indices = list(LengthGroupedSampler(lengths, 4,lengths=lengths)) #<--- Added lengths
print(indices)
Jul 09, 2021 anonim answer
anonim 13.0k

Add a possible fix

Please authorize to post fix