2

# The train_size = %d should be greater or equal to the number of classes = %d

Package:

scikit-learn

47032

Exception Class:

ValueError

## Raise code

```
p.min(class_counts) < 2:
raise ValueError("The least populated class in y has only 1"
" member, which is too few. The minimum"
" number of groups for any class cannot"
" be less than 2.")
if n_train < n_classes:
raise ValueError('The train_size = %d should be greater or '
'equal to the number of classes = %d' %
(n_train, n_classes))
if n_test < n_classes:
raise ValueError('The test_size = %d should be greater or '
'equal to the number of classes = %d' %
(n_test, n_classes))
# Fi
```

## Links to the raise (1)

https://github.com/scikit-learn/scikit-learn/blob/c67518350f91072f9d37ed09c5ef7edf555b6cf6/sklearn/model_selection/_split.py#L1897## Ways to fix

2

The exception is raised when the number of train samples is smaller than the number of classes.

For example (WRONG):

```
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0,0,1,1])
sss = StratifiedShuffleSplit(n_splits=2, test_size=0.7, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
```

we have 4 point in the input array and keep 0.7 of them for testing. This means we would have 1 point for training and 3 for testing.

In the 'y' array we have two classes (0,1) and this means we have 1 training sample and 2 classes which is not correct because the model could not train on all the labels.

To fix it increase you need to increase the number of points or the test size as follows:

```
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [3, 0], [4, 5]])
y = np.array([0,0,0,1,1,1])
sss = StratifiedShuffleSplit(n_splits=2, test_size=0.5, random_state=0)
sss.get_n_splits(X, y)
for train_index, test_index in sss.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
```

### Add a possible fix

Please authorize to post fix