Shape mismatch: if categories is an array, it has to be of shape (n_features,).
Package:
scikit-learn
47032

Exception Class:
ValueError
Raise code
lf, X, handle_unknown='error', force_all_finite=True):
X_list, n_samples, n_features = self._check_X(
X, force_all_finite=force_all_finite)
if self.categories != 'auto':
if len(self.categories) != n_features:
raise ValueError("Shape mismatch: if categories is an array,"
" it has to be of shape (n_features,).")
self.categories_ = []
for i in range(n_features):
Xi = X_list[i]
if self.categories == 'auto':
🙏 Scream for help to Ukraine
Today, 2nd July 2022, Russia continues bombing and firing Ukraine. Don't trust Russia, they are bombing us and brazenly lying in same time they are not doing this 😠, civilians and children are dying too!
We are screaming and asking exactly you to help us, we want to survive, our families, children, older ones.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Please spread the information, and ask your governemnt to stop Russia by any means. We promise to work extrahard after survival to make the world safer place for all.
Links to the raise (1)
https://github.com/scikit-learn/scikit-learn/blob/c67518350f91072f9d37ed09c5ef7edf555b6cf6/sklearn/preprocessing/_encoders.py#L83Ways to fix
This happens if a wrong shaped category array is given when constructing one of the encoders which are based on the _BaseEncoder.
Such as OrdinalEncode
and OneHotEncoder
Reproducing the error:
pipenv install numpy sklearn
from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = ["cold","hot","warm"]
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)
In this sample code the shape of the categories is (3,) and the shape of the data is (3, 1), which means the length of the features is 1. Therefore the categories array should be reshaped in to (features,).
The error output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-3f6409ff813c> in <module>()
4 cat = ["cold","hot","warm"]
5 enc = OrdinalEncoder(categories=cat)
----> 6 enc.fit(data)
7 print(enc.categories_)
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in fit(self, X, y)
627 self
628 """
--> 629 self._fit(X)
630
631 return self
/usr/local/lib/python3.7/dist-packages/sklearn/preprocessing/_encoders.py in _fit(self, X, handle_unknown)
76 if self.categories != 'auto':
77 if len(self.categories) != n_features:
---> 78 raise ValueError("Shape mismatch: if categories is an array,"
79 " it has to be of shape (n_features,).")
80
ValueError: Shape mismatch: if categories is an array, it has to be of shape (n_features,).
Fix:
The categories parameter should be reshaped to (1,3).
from sklearn.preprocessing import OrdinalEncoder
import numpy as np
data = np.array([['cold'],['warm'],['hot']])
cat = list(np.array(["cold","hot","warm"]).reshape(1,3))
enc = OrdinalEncoder(categories=cat)
enc.fit(data)
print(enc.categories_)
The output:
[array(['cold', 'hot', 'warm'], dtype='<U4')]
Add a possible fix
Please authorize to post fix