sequence_stride must be higher than 0 and lower than the length of the data. Got: sequence_stride=%s for data of length %s.
Package:
keras
52268

Exception Class:
ValueError
Raise code
Validate strides
if sampling_rate <= 0 or sampling_rate >= len(data):
raise ValueError(
'sampling_rate must be higher than 0 and lower than '
'the length of the data. Got: '
'sampling_rate=%s for data of length %s.' % (sampling_rate, len(data)))
if sequence_stride <= 0 or sequence_stride >= len(data):
raise ValueError(
'sequence_stride must be higher than 0 and lower than '
'the length of the data. Got: sequence_stride=%s '
'for data of length %s.' % (sequence_stride, len(data)))
if start_index is None:
start_index = 0
if end_index is None:
Links to the raise (1)
https://github.com/keras-team/keras/blob/4a978914d2298db2c79baa4012af5ceff4a4e203/keras/preprocessing/timeseries.py#L163See also in the other packages (1)
(❌️ No answer)
tensorflow/sequence-stride-must-be-highe
Ways to fix
The timeseries_dataset_from_array is, as pointed in the documentation, used to create a dataset of sliding windows over a timeseries provided as array.
What it does is, it generates batches of dataset that are characterized using given time series parameters such as sequence length, spacing between sequences, and spacing between each elements in a sequence from given raw data points.
Basic usage
from tensorflow.keras.preprocessing import timeseries_dataset_from_array
X = np.arange(20)
sequence_length=4 # each sequence shall have a length of 4
stride=3 # the space between each sequenc would be 3
rate=2 # this is the difference between each elements of the sequence
input_dataset = timeseries_dataset_from_array(X,
None,
sequence_length,
sequence_stride=stride,
sampling_rate=rate)
The above sample code will give us the following dataset. Notice the difference between the beginning of each sequence. That is what we call sequence_stride.
And this space/window shouldn't exceed the length of given dataset. i.e. X in this case.
sequence_1 = [ 0 2 4 6] sequence_2 = [ 3 5 7 9] sequence_3 = [ 6 8 10 12] sequence_4 = [ 9 11 13 15] sequence_5 = [12 14 16 18]
Reproducing the error:
The sequence_stride should be greater than zero and less than the length of the raw time series data.
I.e. If any other value outside this range is given the error is reproduced.
$ mkdir test_folder & cd test_folder
$ pipenv shell
$ pipenv install tensorflow numpy
from tensorflow.keras.preprocessing import timeseries_dataset_from_array
X = np.arange(100)
sequence_length=10
stride=100
rate=4
input_dataset = timeseries_dataset_from_array(X,
None,
sequence_length,
sequence_stride=stride,
sampling_rate=rate)
Fixed version of the code:
sequence_stride>0 and sequence_stride<len(data)
from tensorflow.keras.preprocessing import timeseries_dataset_from_array
X = np.arange(100)
sequence_length=10
stride=4 # valid value. this fixes the error
rate=4
input_dataset = timeseries_dataset_from_array(X,
None,
sequence_length,
sequence_stride=stride,
sampling_rate=rate)
More details can be found in the documentation.
Add a possible fix
Please authorize to post fix