Bin labels must be one fewer than the number of bin edges
Package:
pandas
30911

Exception Class:
ValueError
Raise code
elif ordered and len(set(labels)) != len(labels):
raise ValueError(
"labels must be unique if ordered=True; pass ordered=False for duplicate labels" # noqa
)
else:
if len(labels) != len(bins) - 1:
raise ValueError(
"Bin labels must be one fewer than the number of bin edges"
)
if not is_categorical_dtype(labels):
labels = Categorical(
labels,
categories=labels if len(set(labels)) == len(labels) else None,
ordered=ordered,
)
Links to the raise (1)
https://github.com/pandas-dev/pandas/blob/b3e335254f46a526ee3ce9bb757eac4011d9d1fe/pandas/core/reshape/tile.py#L446Ways to fix
pandas.cut
is used to segment and sort data values into bins. This function is also useful for going from a continuous variable to a categorical variable.
The second parameter bins should have the same value as the length of the labels parameter. If not this error is raised.
Reproducing the error:
pipenv install pandas
import pandas as pd
df=pd.cut(np.array([1, 7, 5, 4, 6, 3]),
3, # this should be 4. bc the labels has 4 values
labels=["bad", "medium", "good","very good",],
ordered=False)
print(df)
The error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-cfc67b218327> in <module>()
4 3,
5 labels=["bad", "medium", "good","very good",],
----> 6 ordered=False)
7 print(df)
/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
282 dtype=dtype,
283 duplicates=duplicates,
--> 284 ordered=ordered,
285 )
286
/usr/local/lib/python3.7/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
433 if len(labels) != len(bins) - 1:
434 raise ValueError(
--> 435 "Bin labels must be one fewer than the number of bin edges"
436 )
437 if not is_categorical_dtype(labels):
ValueError: Bin labels must be one fewer than the number of bin edges
Fixed version of the code:
import pandas as pd
df=pd.cut(np.array([1, 7, 5, 4, 6, 3]),
4,
labels=["bad", "medium", "good","very good",],
ordered=False)
print(df)
Output:
['bad', 'very good', 'good', 'medium', 'very good', 'medium']
Categories (4, object): ['bad', 'medium', 'good', 'very good']
Add a possible fix
Please authorize to post fix