Can an appropriate trading strategy be determined through deep learning?
Stock trading represents a method to take advantage of the stock price fluctuations of publicly funded companies to make a profit. Specifically, with many stuck at home during the COVID-19 pandemic, it has become a convenient form of investment for many with the possibility of huge returns, evident in the 50% increase in equity option trading in this year alone .
It is therefore no doubt that many are constantly finding ways to gain an edge and anticipate market trends. Yet, it remains a possibly tumultuous task, due to the vast complexity of information and sudden changes which remain difficult to predict.
Therefore, I wanted to understand whether how accurately could a trained model provide us with accurate price predictions, and what would be the best way to predict a rise/drop before it happens.
Specifically, if I invest $10000 into my models, can my entry/exit into the market based on the predicted percentage change allow me to earn money?
There has been a wide range of articles that have explored this area  and have been invaluable to the formulation of my article. However, there have been some comments on the articles which I have noted:
(1) Normalisation of combined test/train data or the leaking of test data into trained inputs
This happens when the model relies on information that was not currently available, which gives better than expected results due to the future information it already possesses.
(2) Difficulty in predicting next day’s price
There are many articles out there that claim that they have some kind of way to make money via machine/deep learning, but the problem is that they have difficulties making a generalisable profit as the next day’s predicted price is just tuned based on today’s price. Schmalz in his article  explains it very clearly.
Therefore, the purpose of the article is not to claim a supposedly ground-breaking way to predict prices, but rather to go through the machine/deep-learning pipeline to understand how models are improved to slowly eliminate the problems mentioned, while at the same time trying to see if the insights from my models can contribute to a reasonable trading strategy.
Before we delve in, we need to establish why we do analysis on time series data when it seems like price movements are arbitrary. Essentially, the theoretical basis of technical analysis is that:
(i) Price reflects all the relevant information.
(ii) Price movements are not totally random, and that history repeats itself to a certain degree.
Feel free to refer to my github repo at https://github.com/yuhaoleeyh/stock-project for my code :)
EXPLORATORY DATA ANALYSIS
I first chose to zoom in on Google stock data over a 10 year period (from 31 Aug 2010 to 1 Sep 2020) from Yahoo Finance API.
Before we move into the generating of models, it is critical to get a sense of what the data entails as well as the identification of features that will be important to us:
We note the very low number of features present (only 6 columns). In other words, these columns by themselves may not give us very good results to train on. Here is the adjusted close of GOOGL for the past 10 years:
Why do we focus on adjusted close? This price takes into account dividends, stock splits and any rights offerings, as well as any factors that happen after markets close. As a result, this is the feature which we would attempt to predict as it would be more reliable than the other metrics that are susceptible to non-market forces.
In the graph, we see an overall increasing trend in the adjusted close price. To see the possible features which may be contribute to this, we examined the relation between the various features. We realise that the volume has a negative correlation with adjusted close. Hence, it seems that volume could become an important feature to be used in our models.
Therefore, I will start by using adjusted close and volume as the parameters to my prediction model. However, these features by themselves would not be sufficient.
Arguably one of the less appreciated yet critical parts of machine/deep learning is the engineering of appropriate features. Models by themselves are extremely powerful, but without suitable features, the model may find it difficult draw accurate inferences from the data.
I attempt to incorporate a few additional features before we begin any model training:
(1) On Balance Volume
This metric is a momentum technical indicator, using the change in the volume as an indication to the change in stock prices. We add the volume when the current day’s closing price is higher than yesterday and subtract it when the current day’s price is lower. A positive volume pressure would result in higher prices as the demand surges, and likewise a negative volume pressure would eventually result in lower prices.
Next, I used the finta library, which has almost 76 technical indicators available for use. (https://pypi.org/project/finta/) In particular, I focused on a few leading indicators which may give the model the ability to quickly react to stock changes:
(2) Exponential moving average (EMA)
The EMA is a moving average which places greater weight on recent prices than ones that are further away from the current time period. This helps to reduce the amount of lag and allows for faster reaction to change in prices.
(3) Bollinger bands
A Bollinger band defines 2 lines that are set 2 deviations apart from the time series’s simple moving average. It sets a constraint on the possibility of the next day’s price as most of the price action happens within this area. The closer the bands, the more volatile the price movement and hence the more likely a current trend may be ending or even reversing.
Once we add in the features, it is time to prepare our data as input for our models!
The dataset is split into 70% train and 30% test set. Critically, this train-test split has to be done under 2 conditions (1) the train set must always be before the period of the test set since this is a time series prediction and (2) this split must be done before any normalisation/scaling to avoid look-ahead bias.
Note that for most data, doing a K-Fold Cross validation would be more ideal as we can evaluate models by using various folds as the validation set. While such an approach is more accurate and also results in greater data usage efficiency, the problem is that using K-Fold violates the temporal procedure of time series data as the validation datasets precede the time frame of train data points. Therefore, I would simply use keras’s validation_split to extract the last 10% of the train set as my validation set.
We next perform a normalisation, which is important when the magnitude of the columns are different and hence the effect of one’s column’s change is more significant than the other. During normalisation, we do a fit_transform on the train_data in order to ensure that all trained data is scaled between 0 and 1.
normaliser = preprocessing.MinMaxScaler()
train_normalised_data = normaliser.fit_transform(train_data)test_normalised_data = normaliser.transform(test_data)
Next, I started forming the input and output features for the train and test sets. For the input, it consists of the 21 prior days of features. I chose 21 days as this is the average number of trading days in a month which would give the model sufficient amount of features to learn. The output is the 22th day of adjusted close price.
history_points = 21X_train = np.array([train_normalised_data[i : i + history_points].copy() for i in range(len(train_normalised_data) - history_points)])y_train = np.array([train_normalised_data[:,0][i + history_points].copy() for i in range(len(train_normalised_data) - history_points)])X_test = np.array([test_normalised_data[i : i + history_points].copy() for i in range(len(test_normalised_data) - history_points)])y_test = np.array([test_data['Adj Close'][i + history_points].copy() for i in range(len(test_normalised_data) - history_points)])
LONG SHORT-TERM MEMORY (LSTM MODEL)
Now time for the fun part!! The deep learning!
I will attempt to use the Long Short-Term Memory (LSTM) model, a common deep learning recurrent neural network (RNN) commonly used in predicting time series data.
LSTM has logic gates (input, output and forget gates) which give inherent ability for it to retain information that is more relevant and forgo unnecessary information. This makes LSTM a good model for interpreting patterns over long periods.
The important thing to note about LSTM is the input, which needs to be in the form of a 3D vector (samples, time-steps, features). Hence, the input has to be reshaped to fit this.
We first import all the necessary keras and tensorflow libraries. Note that we also have to set a random seed to ensure that the results are replicable:
import tensorflow as tf
from keras import optimizers
from keras.callbacks import History
from keras.models import Model
from keras.layers import Dense, Dropout, LSTM, Input, Activation, concatenate
import numpy as np
We next start with the most simple LSTM layer with a dense layer as output. We run it with 30 epochs and a validation_split of 0.1 to get a general idea of how well the model learns.
Note that shuffling is only done on the train_data and not the validation data, and that validation_split is obtained from the last x and y data provided, ensuring that the temporal order of the data is maintained for the training.
lstm_input = Input(shape=(history_points, 6), name='lstm_input')
inputs = LSTM(21, name='first_layer')(lstm_input)
inputs = Dense(1, name='dense_layer')(inputs)output = Activation('linear', name='output')(inputs)
model = Model(inputs=lstm_input, outputs=output)adam = optimizers.Adam()model.compile(optimizer=adam, loss='mse')model.fit(x=X_train, y=y_train, batch_size=15, epochs=30, shuffle=True, validation_split = 0.1)
We plot the result for our validation test set, which has a reasonable RMSE of 12.698:
Once done, we predict on the x_test and plot the results against the actual results below:
Decent! The general direction is there and it seems that the LSTM model is able to learn the trend of the GOOGL Stocks. However, the RMSE is quite high (76.976), hence it may not be a good predictive model. Purely the addition of suitable features is not sufficient for the model to achieve optimal results.
Most machine/deep learning models have internal parameters which usually need to be altered in order to achieve more accurate predictions.
Tuning the number of epochs is an important factor. The increase in the number of epochs would reduce the error but too many epochs would result in the model learning the noise of the train data. One needs to pay attention to the validation loss, and stop when it is at the lowest point as beyond that, we are overfitting our features just to increase our accuracy. I ran the model over 1000 epochs and plotted the train and validation loss against number of epochs:
Therefore I ran the model with early stopping where the model would not be run once the validation accuracy does not improve for 20 consecutive epochs, and chose epoch = 170 where the validation loss is at its lowest.
Next, I conducted a systematic grid search into to obtain the most optimal results. Running over 100 models, I experimented with the adding of dense layers, the increasing of neurons in the dense layer, the learning rate of the neural network, adding of dropout layers and LSTM stacking. The learning rate for a model is illustrated by the following diagram:
An optimal learning rate has to be chosen in order to avoid unreliable training results/inability to train the model. Similarly, the vertical stacking of LSTM layers would increase the model complexity and hence hopefully improve the accuracy of the result.
After much testing, I tuned the model based on the best hyperparameters obtained from the grid search with the layers below:
lstm_input = Input(shape=(history_points, 6), name='lstm_input')inputs = LSTM(21, name='first_layer')(lstm_input)inputs = Dense(16, name='first_dense_layer')(inputs)inputs = Dense(1, name='second_dense_layer')(inputs)output = Activation('linear', name='output')(inputs)model = Model(inputs=lstm_input, outputs=output)adam = optimizers.Adam(lr = 0.0008)model.compile(optimizer=adam, loss='mse')model.fit(x=X_train, y=y_train, batch_size=15, epochs=170, shuffle=True, validation_split = 0.1)
This managed to obtain a reasonable RMSE of 48.997 on the test set. It was able to model the price increases/decreases relatively well.
LIMITATIONS OF FIRST MODEL
The model looks decent! Is it time to start earning money? It may be tempting to put money in my model (Please don’t!), and I decided to test that.
Starting off with $10000 as base money, I made an algorithm that buys as much stock as possible when there is over a predicted 0.15% increase in price movement for the next day, and sells as much stock as possible when there is over a predicted 0.15% decrease in price movement for the next day.
However, with this model we actually end up with $8424! The top graph shows the buys(green arrow)/sells(red arrow) made by the model on the actual test set, while the bottom graph represents the percentage change (of the next day prediction) over the same time period:
In other words, we were better off not even investing any money at all! Why is that so? This is due to the concept of leading/lagging models.
Leading models: The ability to provide guiding insights to how a price would change before it occurs.
Lagging models: Models that react to a price change only after it has occurred.
In stock analysis, we need to focus our efforts on identifying leading models that is able to foretell the price position accurately. This is usually done through training on seasonality:
This concept of stationarity which is critical in time series data. Essentially, time series data will have trends, seasonality and noise:
We should understand such models work best when the LSTM model can better train on the constant seasonality rather than trying to learn with a trend present.
To try and get the model to learn the seasonality of the data, which is caused mainly due to stock prices eventually reverting to a stable point after fluctuations beyond the mean, I try to use the difference (P(t +1) -P(t)) as the only input feature instead. This would hopefully allow the generation of a leading model. At the same time, this prevents the model from having reference to yesterday’s price that would hopefully circumvent the problem of a lagging model. Talk about killing 2 birds with 1 stone :)
After which, I trained the data on LSTM models. Specifically, I decided to employ more LSTM layers. The vertical stacking of LSTM layers would increase the model complexity and hence hopefully improve the accuracy of the result. Note I only used 25 epochs as beyond that point the validation loss increases drastically.
After doing a further grid search, I managed to obtain a decent performing model with RMSE of 45.273 on the validation set using the layers below:
lstm_input = Input(shape=(data_set_points, 1), name='lstm_input')
inputs = LSTM(21, name='first_layer', return_sequences = True)(lstm_input)
inputs = Dropout(0.1, name='first_dropout_layer')(inputs)
inputs = LSTM(64, name='lstm_1')(inputs)
inputs = Dropout(0.05, name='lstm_dropout_1')(inputs)
inputs = Dense(32, name='first_dense_layer')(inputs)
inputs = Dense(1, name='dense_layer')(inputs)
output = Activation('linear', name='output')(inputs)model = Model(inputs=lstm_input, outputs=output)
adam = optimizers.Adam(lr = 0.002)model.compile(optimizer=adam, loss='mse')
model.fit(x=X_train, y=y_train, batch_size=15, epochs=25, shuffle=True, validation_split = 0.1)
There are 2 important things to note. Firstly, one should put return_sequences = True for multiple LSTM layers so that the output can then be provided to the next layer. Secondly, to improve the accuracy, I consistently added dropout layers after each LSTM layer that would help to reduce overfitting, something which neural networks are highly prone to.
The test set prediction does not look good at first glance but firstly, this 2nd model managed to generally predict the price direction correctly simply from the differences in the training set (without reference to price), even though the trend in the test set sharply increases.
Secondly, it seemed like the model was able to learn some leading features. At t = 20 there is a predicted drop in prices where is mirrored by the drop at t = 25, and after which the forecasted increase can also be seen from t = 30 to t = 40, mirrored in the actual blue graph. A predicted drop from t = 550 was also mirrored at t = 600. This result showed that the removal of trend may have reduced the accuracy but might have provided insights into the best times to enter/exit the stock market.
Below, I attempt to use the same $10000 algorithm to find out whether I can earn money from my model. You can see that in this one, my algorithm is doing a good job of buying low and selling high based on my predictions.
From this, I earned a grand total of $7683 from the prediction which when compared to the buy and hold strategy yielded $6558, translating to a $1135 gain! Awesome! In other words, while it did not model the magnitude very well, it managed to provide leading indicators to price changes which allowed my algorithm to make a gain.
This may seem like some coincidence, hence I decided to use the differences and incorporate it back to the actual prices. For each actual price at time t, I use the predicted difference obtained to get the price at t + 1 and we obtain this result:
Remarkable! I obtained an improvement with RMSE of 24.014! This shows that the model is able to generally predict the correct direction rather accurately due to the constant seasonality. It demonstrates that training on the difference rather than on the magnitude actually results in a higher accuracy and a better ability to generate a potentially leading model.
However, I would reiterate that training a model only on 1 stock may not mean this is generalisable to all stocks or other time periods. Yet, this result represents huge opportunities to predict methods of entry/exit into the market, and presents potential for better tuning/feature engineering based on the constant stationarity to generate a sustainable model.
CONCLUSION AND TAKEAWAYS
Overall, it seemed like my LSTM model was able to significantly predict stock price movement generally successfully with feature engineering and hypertuning. Removing the trend and getting the model to train on only the seasonal component however seemed to have granted the model the ability to have leading indicators and gave the prediction a much better accuracy. There is great potential to the prediction of entry/exit into the market based on a proper threshold to achieve the most optimal result.
I would like to sincerely thank Min Yan and HeiCoders Academy (https://heicodersacademy.com) for putting and investing a great amount of time into mentoring, advising and giving me input in the formulation of this article.
This project is meant purely from an education and research standpoint, and that it should not be naively used as a guide or benchmark to financial investment. Any stock trading involves significant risk as the valuation of such portfolios fluctuate, and should be done with care. This article by no means serves as a guide to stock investment, and any information used is at one’s own risk.
This is my first article on Medium, and so I welcome any feedback to the improving of my models/future directions :))