votes up 8

Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Package:
Exception Class:
ValueError

Raise code

lf, X, handle_unknown='error', force_all_finite=True):
        X_list, n_samples, n_features = self._check_X(
            X, force_all_finite=force_all_finite)

        if self.categories != 'auto':
            if len(self.categories) != n_features:
                raise ValueError("Shape mismatch: if categories is an array,"
                                 " it has to be of shape (n_features,).")

        self.categories_ = []

        for i in range(n_features):
            Xi = X_list[i]
            if self.categories == 'auto':
                

Ways to fix

votes up 2 votes down

This happens if a wrong shaped category array is given when constructing one of the encoders which are based on the _BaseEncoder. Such as OrdinalEncode and OneHotEncoder

Reproducing the error:

pipenv install numpy sklearn

from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = ["cold","hot","warm"]
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)

In this sample code the shape of the categories is (3,) and the shape of the data is (3, 1), which means the length of the features is 1. Therefore the categories array should be reshaped in to (features,).

The error output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-3f6409ff813c> in <module>()
      4 cat = ["cold","hot","warm"]
      5 enc = OrdinalEncoder(categories=cat)
----> 6 enc.fit(data)
      7 print(enc.categories_)

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in fit(self, X, y)
    627         self
    628         """
--> 629         self._fit(X)
    630 
    631         return self

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in _fit(self, X, handle_unknown)
     76         if self.categories != 'auto':
     77             if len(self.categories) != n_features:
---> 78                 raise ValueError("Shape mismatch: if categories is an array,"
     79                                  " it has to be of shape (n_features,).")
     80 

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Fix:

The categories parameter should be reshaped to (1,3).

from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = list(np.array(["cold","hot","warm"]).reshape(1,3))
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)

The output:

[array(['cold', 'hot', 'warm'], dtype='<U4')]

Jul 15, 2021 kellemnegasi answer
kellemnegasi 22.6k

Add a possible fix

Please authorize to post fix