Мне необходимо предсказать значение колонки `Close` (цена акции) с 3 входными данными: `Close`, `Open` и `Volume`. Пример датасета:
Close Open Volume
2019-09-20 5489.0 5389.0 1578781
2019-09-23 5420.0 5460.0 622325
2019-09-24 5337.5 5424.0 688395
2019-09-25 5343.5 5326.5 628849
2019-09-26 5387.5 5345.0 619344
... ... ... ...
2020-03-30 4459.0 4355.0 1725236
2020-03-31 4715.0 4550.0 2433310
2020-04-01 4674.5 4596.0 1919728
2020-04-02 5050.0 4865.0 3860103
2020-04-03 5204.5 5050.0 3133078
[134 rows x 3 columns]
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 134 entries, 2019-09-20 to 2020-04-03
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Close 134 non-null float64
1 Open 134 non-null float64
2 Volume 134 non-null int64
dtypes: float64(2), int64(1)
Вопрос мой в том, что неправильно в коде ниже для предсказания значения за последние 10 дней. Результат у меня такой, что очевидно неверно:
Epoch 1/1
64/64 [==============================] - 6s 88ms/step - loss: 37135470.9219
[32.58649 ]
[32.58663 ]
[32.58682 ]
rmse: 4625.457010985681
Проблема все равно остается даже если я убираю `fit_transform` (для `y_train` я так же не делаю scale, не знаю нужно ли). Код:
from math import sqrt
from numpy import concatenate
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding
from keras.layers import LSTM
import numpy as np
from datetime import datetime, timedelta
import yfinance as yf
start = (datetime.now() - timedelta(days=200)).strftime("%Y-%m-%d")
end = (datetime.now() - timedelta(days=1)).strftime("%Y-%m-%d")
df = yf.download(tickers="LKOH.ME", start=start, end=end, interval="1d")
dataset = df.loc[start:end].filter(['Close', 'Open', 'Volume']).values
scaler = MinMaxScaler(feature_range=(0,1))
training_data_len = len(dataset) - 10 # last 10 days to test
train_data = dataset[0:int(training_data_len), :]
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, :]) # получаем 3 features
y_train.append(train_data[i, 0]) # 0 значит предсказываем Close
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1]*x_train.shape[2])) # convert to 2d for fit_transform()
x_train = scaler.fit_transform(x_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
model = Sequential()
# здесь нужно поменять input_shape=(x_train.shape[1], 3) в силу 3-х features?
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, y_train, batch_size=1, epochs=1)
test_data = dataset[training_data_len - 60:, :]
x_test = []
y_test = dataset[training_data_len:, 0]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, :])
x_test = np.array(x_test)
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1]*x_test.shape[2]))
x_test = scaler.fit_transform(x_test)
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
predictions = model.predict(x_test)
print('rmse:', np.sqrt(np.mean(((predictions - y_test) ** 2))))