votes up 8

Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Package:
Exception Class:
ValueError

Raise code

lf, X, handle_unknown='error', force_all_finite=True):
        X_list, n_samples, n_features = self._check_X(
            X, force_all_finite=force_all_finite)

        if self.categories != 'auto':
            if len(self.categories) != n_features:
                raise ValueError("Shape mismatch: if categories is an array,"
                                 " it has to be of shape (n_features,).")

        self.categories_ = []

        for i in range(n_features):
            Xi = X_list[i]
            if self.categories == 'auto':
                
🙏 Scream for help to Ukraine
Today, 2nd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this 😠, civilians and children are dying too! We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.

Ways to fix

votes up 2 votes down

This happens if a wrong shaped category array is given when constructing one of the encoders which are based on the _BaseEncoder. Such as OrdinalEncode and OneHotEncoder

Reproducing the error:

pipenv install numpy sklearn

from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = ["cold","hot","warm"]
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)

In this sample code the shape of the categories is (3,) and the shape of the data is (3, 1), which means the length of the features is 1. Therefore the categories array should be reshaped in to (features,).

The error output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-3f6409ff813c> in <module>()
      4 cat = ["cold","hot","warm"]
      5 enc = OrdinalEncoder(categories=cat)
----> 6 enc.fit(data)
      7 print(enc.categories_)

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in fit(self, X, y)
    627         self
    628         """
--> 629         self._fit(X)
    630 
    631         return self

/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in _fit(self, X, handle_unknown)
     76         if self.categories != 'auto':
     77             if len(self.categories) != n_features:
---> 78                 raise ValueError("Shape mismatch: if categories is an array,"
     79                                  " it has to be of shape (n_features,).")
     80 

ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).

Fix:

The categories parameter should be reshaped to (1,3).

from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = list(np.array(["cold","hot","warm"]).reshape(1,3))
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)

The output:

[array(['cold', 'hot', 'warm'], dtype='<U4')]

Jul 15, 2021 kellemnegasi answer
kellemnegasi 30.0k

Add a possible fix

Please authorize to post fix