votes up 2

Vocabulary not fitted or provided

Package:
Exception Class:
NotFittedError

Raise code

            self.fixed_vocabulary_ = False

    def _check_vocabulary(self):
        """Check if vocabulary is empty or missing (not fitted)"""
        if not hasattr(self, 'vocabulary_'):
            self._validate_vocabulary()
            if not self.fixed_vocabulary_:
                raise NotFittedError("Vocabulary not fitted or provided")

        if len(self.vocabulary_) == 0:
            raise ValueError("Vocabulary is empty")

    def _validate_params(self):
        """Check validity of ngram_range parameter"""
        min_n, max_m = self.ngram_range

Ways to fix

votes up 3 votes down

Calling get_feature_names() on unfitted CountVectorizer() causes this error.

Code to reproduce the exception:

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer.get_feature_names()

---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
<ipython-input-2-ae1b1786f73f> in <module>()  5 'Is this the first document?',]  6 vectorizer = CountVectorizer() ----> 7 vectorizer.get_feature_names()  8 # X = vectorizer.fit_transform(corpus)  9 # print(vectorizer.get_feature_names()) 
/usr/local/lib/python3.7/dist-packages/sklearn/feature_extraction/text.py in get_feature_names(self)  1313 """  1314  -> 1315 self._check_vocabulary()  1316   1317 return [t for t, i in sorted(self.vocabulary_.items(), 
/usr/local/lib/python3.7/dist-packages/sklearn/feature_extraction/text.py in _check_vocabulary(self)  488 self._validate_vocabulary()  489 if not self.fixed_vocabulary_: --> 490 raise NotFittedError("Vocabulary not fitted or provided")  491   492 if len(self.vocabulary_) == 0: 
NotFittedError: Vocabulary not fitted or provided

Fixed version of the code:

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is the first document.',
          'This document is the second document.',
          'And this is the third one.',
          'Is this the first document?',]
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names()

['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

Nov 10, 2021 kellemnegasi answer
kellemnegasi 18.4k

Add a possible fix

Please authorize to post fix