<имя виртуального окружения>\Scripts\activate
class EnvironmentSetup(unittest.TestCase):
base_config = None
@classmethod
def setUpClass(cls):
cls.base_config = cls.setConfig()
@classmethod
def setConfig(cls):
with open('../config.json') as file:
config = json.load(file)
return config
def __new__(cls):
# Перекрываем создание объекта класса
if not hasattr(cls, 'instance'):
cls.instance = super().__new__(cls)
return cls.instance
You can utilize warm_start=True and call .partial_fit() (instead of .fit()).
See the documentation here for the model you are using where it describes that argument and function respectively.
Basically, you would load only a portion of the data at a time, run it through your pipeline and call partial_fit in a loop. This would keep the memory requirements down while also allowing you to train on all the data, regardless of the amount.
EDIT
As noted in the comments, the above mentioned loop will only work for the predictive model, so the data pre-processing will need to occur separately.
Here is a solution for training the CountVectorize...
This question contains a TFIDF implementation tha...
So the final solution would be to preprocess the data in two stages. The first for the CountVectorizer and the second for the TFIDF weighting.
Then to train the model you follow the same process as originally proposed, except without a Pipeline because that is no longer needed.
Есть ли те, кто имел возможность сравнить VA и IPS матрицу? Стоит ли обращать внимание на этот параметр или обе матрицы по своему хороши?