votes up 2

The train_size = %d should be greater or equal to the number of classes = %d

Package:
Exception Class:
ValueError

Raise code

p.min(class_counts) < 2:
            raise ValueError("The least populated class in y has only 1"
                             " member, which is too few. The minimum"
                             " number of groups for any class cannot"
                             " be less than 2.")

        if n_train < n_classes:
            raise ValueError('The train_size = %d should be greater or '
                             'equal to the number of classes = %d' %
                             (n_train, n_classes))
        if n_test < n_classes:
            raise ValueError('The test_size = %d should be greater or '
                             'equal to the number of classes = %d' %
                             (n_test, n_classes))

        # Fi

Ways to fix

votes up 2 votes down

The exception is raised when the number of train samples is smaller than the number of classes.

For example (WRONG):

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0,0,1,1])
sss = StratifiedShuffleSplit(n_splits=2, test_size=0.7, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

we have 4 point in the input array and keep 0.7 of them for testing. This means we would have 1 point for training and 3 for testing.

In the 'y' array we have two classes (0,1) and this means we have 1 training sample and 2 classes which is not correct because the model could not train on all the labels.

To fix it increase you need to increase the number of points or the test size as follows:

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [3, 0], [4, 5]])
y = np.array([0,0,0,1,1,1])
sss = StratifiedShuffleSplit(n_splits=2, test_size=0.5, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Jun 29, 2021 aandrei.pi answer

Add a possible fix

Please authorize to post fix