votes up 6

sequence_stride must be higher than 0 and lower than the length of the data. Got: sequence_stride=%s for data of length %s.

Package:
keras
github stars 52268
Exception Class:
ValueError

Raise code

Validate strides
  if sampling_rate <= 0 or sampling_rate >= len(data):
    raise ValueError(
        'sampling_rate must be higher than 0 and lower than '
        'the length of the data. Got: '
        'sampling_rate=%s for data of length %s.' % (sampling_rate, len(data)))
  if sequence_stride <= 0 or sequence_stride >= len(data):
    raise ValueError(
        'sequence_stride must be higher than 0 and lower than '
        'the length of the data. Got: sequence_stride=%s '
        'for data of length %s.' % (sequence_stride, len(data)))

  if start_index is None:
    start_index = 0
  if end_index is None:
    
😲 Agile task management is now easier than calling a taxi. #Tracklify

Ways to fix

votes up 3 votes down

The timeseries_dataset_from_array is, as pointed in the documentation, used to create a dataset of sliding windows over a timeseries provided as array.

What it does is, it generates batches of dataset that are characterized using given time series parameters such as sequence length, spacing between sequences, and spacing between each elements in a sequence from given raw data points.

Basic usage

from tensorflow.keras.preprocessing import timeseries_dataset_from_array

X = np.arange(20)
sequence_length=4 # each sequence shall have a length of 4
stride=3 # the space between each sequenc would be 3
rate=2  # this is the difference between each elements of the sequence

input_dataset = timeseries_dataset_from_array(X,
                                              None,
                                              sequence_length,
                                              sequence_stride=stride,
                                              sampling_rate=rate)

The above sample code will give us the following dataset. Notice the difference between the beginning of each sequence. That is what we call sequence_stride. And this space/window shouldn't exceed the length of given dataset. i.e. X in this case.

sequence_1 = [ 0  2  4  6]
sequence_2 =      [ 3  5  7  9]
sequence_3 =           [ 6  8 10 12]
sequence_4 =                [ 9 11 13 15]
sequence_5 =                     [12 14 16 18]

Reproducing the error:

The sequence_stride should be greater than zero and less than the length of the raw time series data.

I.e. If any other value outside this range is given the error is reproduced.

$ mkdir test_folder & cd test_folder

$ pipenv shell

$ pipenv install tensorflow numpy

from tensorflow.keras.preprocessing import timeseries_dataset_from_array
X = np.arange(100)
sequence_length=10
stride=100
rate=4
input_dataset = timeseries_dataset_from_array(X,
                                              None,
                                              sequence_length,
                                              sequence_stride=stride,
                                              sampling_rate=rate)

Fixed version of the code:

sequence_stride>0 and sequence_stride<len(data)

from tensorflow.keras.preprocessing import timeseries_dataset_from_array

X = np.arange(100)
sequence_length=10
stride=4 # valid value. this fixes the error
rate=4
input_dataset = timeseries_dataset_from_array(X,
                                              None,
                                              sequence_length,
                                              sequence_stride=stride,
                                              sampling_rate=rate)

More details can be found in the documentation.

Jun 17, 2021 kellemnegasi answer
kellemnegasi 22.6k

Add a possible fix

Please authorize to post fix