Customizing CSV Data in Stock Prediction Neural Networks
Introduction
As machine learning models become increasingly sophisticated, they are being applied to a wide range of applications, including finance. One area of particular interest is stock prediction using neural networks. In this article, we will explore how to modify code to fetch data from a custom CSV file instead of relying on Yahoo Finance.
Understanding the Problem
Many tutorials and examples demonstrate how to use the pandas_datareader
library to retrieve stock data from Yahoo Finance. However, these examples are limited by their reliance on internet connectivity. What if you want to run your code offline or without access to the internet? This is where custom CSV data comes in.
Step 1: Understanding Custom CSV Data
Custom CSV data refers to any dataset stored in a comma-separated values file (.csv). In this case, we will be using a CSV file containing stock prices for a specific company. The key characteristics of our custom CSV data are:
- No headings: Unlike traditional CSV files, our custom CSV data does not have column headers.
- Custom formatting: Our CSV data is formatted to match the expected input requirements for our machine learning model.
Step 2: Preparing Custom CSV Data
To prepare our custom CSV data, we must first import necessary libraries and load the data into a pandas DataFrame. We will use the pandas
library to handle data manipulation and analysis.
from pandas_datareader import data as wb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Define start and end dates for our dataset
start = '2019-06-30'
end = '2020-06-30'
# Define the ticker symbol(s) of interest (in this case, Google)
tickers = ['GOOG']
Step 3: Handling Custom CSV Data
To handle our custom CSV data, we must first read the data from the .csv file into a pandas DataFrame. Since our custom CSV data does not have column headers, we will use the wb.DataReader
function with an empty list of columns to load the data.
# Create a new empty DataFrame
price_data = []
# Load data from CSV and append it to price\_data
for ticker in tickers:
prices = wb.DataReader(ticker, start=start, end=end, data_source='yahoo')[['Open', 'Adj Close']]
# Append our custom column headers (ticker symbol)
prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']].to_frame()
Step 4: Normalizing Our Data
To prepare our data for training, we must normalize it. This involves scaling the values to a range between 0 and 1.
# Set up normalization parameters
rcParams['figure.figsize'] = 20,10
# Initialize MinMaxScaler object for normalization
scaler = MinMaxScaler(feature_range=(0, 1))
# Apply normalization to our data
df = pd.concat(price_data)
df.reset_index(inplace=True)
for col in df.columns:
print(col)
df['Adj Close'].plot()
plt.legend(loc=2)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
Step 5: Training Our Model
To train our model, we must split our data into training and testing sets.
# Define train/test split parameters
ntrain = 80
df_train = df.head(int(len(df) \* (ntrain/100)))
ntest = -80
df_test = df.tail(int(len(df) \* (ntest/100)))
importing the packages
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
# Create new dataframe with index and date column
seriesdata = df.sort_index(ascending=True, axis=0)
new_seriesdata = pd.DataFrame(index=range(0,len(df)), columns=['Date','Adj Close'])
length_of_data=len(seriesdata)
for i in range(0,length_of_data):
new_seriesdata['Date'][i] = seriesdata['Date'][i]
new_seriesdata['Adj Close'][i] = seriesdata['Adj Close'][i]
# Set index again
new_seriesdata.index = new_seriesdata.Date
new_seriesdata.drop('Date', axis=1, inplace=True)
myseriesdataset = new_seriesdata.values
totrain = myseriesdataset[0:255,:]
tovalid = myseriesdataset[255:, :]
scalerdata = MinMaxScaler(feature_range=(0, 1))
scale_data = scalerdata.fit_transform(myseriesdataset)
x_totrain, y_totrain = [], []
length_of_totrain=len(totrain)
for i in range(60,length_of_totrain):
x_totrain.append(scale_data[i-60:i,0])
y_totrain.append(scale_data[i,0])
x_totrain, y_totrain = np.array(x_totrain), np.array(y_totrain)
x_totrain = np.reshape(x_totrain, (x_totrain.shape[0],x_totrain.shape[1],1))
lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(x_totrain.shape[1],1)))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50, return_sequences=True))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(units=1))
lstm_model.compile(loss='mean_squared_error', optimizer='adadelta')
# Train our model
lstm_model.fit(x_totrain, y_totrain, epochs=10, batch_size=1, verbose=2)
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
myinputs = myinputs.reshape(-1,1)
myinputs = scalerdata.transform(myinputs)
tostore_test_result = []
for i in range(60,myinputs.shape[0]):
tostore_test_result.append(myinputs[i-60:i,0])
tostore_test_result = np.array(tostore_test_result)
tostore_test_result = np.reshape(tostore_test_result,(tostore_test_result.shape[0],tostore_test_result.shape[1],1))
myclosing_priceresult = lstm_model.predict(tostore_test_result)
myclosing_priceresult = scalerdata.inverse_transform(myclosing_priceresult)
Step 6: Making Predictions with Our Model
To make predictions, we simply need to call our model on new input data.
# Set up train/test split parameters again
totrain = df_train
tovalid = df_test
myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values
# Predict next day's stock price using our trained model
print(len(tostore_test_result));
print(myclosing_priceresult);
Conclusion
In this article, we demonstrated how to modify code to fetch data from a custom CSV file instead of relying on Yahoo Finance. We covered the steps necessary for preparing and handling custom CSV data, including normalization, splitting our dataset into training and testing sets, and training our model using LSTM neural networks.
Last modified on 2024-04-22