Customizing CSV Data in Stock Prediction Neural Networks for Offline Analysis Without Internet Connectivity Requirements

Customizing CSV Data in Stock Prediction Neural Networks

Introduction

As machine learning models become increasingly sophisticated, they are being applied to a wide range of applications, including finance. One area of particular interest is stock prediction using neural networks. In this article, we will explore how to modify code to fetch data from a custom CSV file instead of relying on Yahoo Finance.

Understanding the Problem

Many tutorials and examples demonstrate how to use the pandas_datareader library to retrieve stock data from Yahoo Finance. However, these examples are limited by their reliance on internet connectivity. What if you want to run your code offline or without access to the internet? This is where custom CSV data comes in.

Step 1: Understanding Custom CSV Data

Custom CSV data refers to any dataset stored in a comma-separated values file (.csv). In this case, we will be using a CSV file containing stock prices for a specific company. The key characteristics of our custom CSV data are:

  • No headings: Unlike traditional CSV files, our custom CSV data does not have column headers.
  • Custom formatting: Our CSV data is formatted to match the expected input requirements for our machine learning model.

Step 2: Preparing Custom CSV Data

To prepare our custom CSV data, we must first import necessary libraries and load the data into a pandas DataFrame. We will use the pandas library to handle data manipulation and analysis.

from pandas_datareader import data as wb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Define start and end dates for our dataset
start = '2019-06-30'
end = '2020-06-30'

# Define the ticker symbol(s) of interest (in this case, Google)
tickers = ['GOOG']

Step 3: Handling Custom CSV Data

To handle our custom CSV data, we must first read the data from the .csv file into a pandas DataFrame. Since our custom CSV data does not have column headers, we will use the wb.DataReader function with an empty list of columns to load the data.

# Create a new empty DataFrame
price_data = []

# Load data from CSV and append it to price\_data
for ticker in tickers:
    prices = wb.DataReader(ticker, start=start, end=end, data_source='yahoo')[['Open', 'Adj Close']]
    # Append our custom column headers (ticker symbol)
    prices.assign(ticker=ticker)[['ticker', 'Open', 'Adj Close']].to_frame()

Step 4: Normalizing Our Data

To prepare our data for training, we must normalize it. This involves scaling the values to a range between 0 and 1.

# Set up normalization parameters
rcParams['figure.figsize'] = 20,10

# Initialize MinMaxScaler object for normalization
scaler = MinMaxScaler(feature_range=(0, 1))

# Apply normalization to our data
df = pd.concat(price_data)
df.reset_index(inplace=True)

for col in df.columns:
    print(col) 
    
df['Adj Close'].plot()
plt.legend(loc=2)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

Step 5: Training Our Model

To train our model, we must split our data into training and testing sets.

# Define train/test split parameters
ntrain = 80
df_train = df.head(int(len(df) \* (ntrain/100)))
ntest = -80
df_test = df.tail(int(len(df) \* (ntest/100)))

importing the packages 
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM

# Create new dataframe with index and date column
seriesdata = df.sort_index(ascending=True, axis=0)
new_seriesdata = pd.DataFrame(index=range(0,len(df)), columns=['Date','Adj Close'])

length_of_data=len(seriesdata)

for i in range(0,length_of_data):
    new_seriesdata['Date'][i] = seriesdata['Date'][i]
    new_seriesdata['Adj Close'][i] = seriesdata['Adj Close'][i]

# Set index again
new_seriesdata.index = new_seriesdata.Date
new_seriesdata.drop('Date', axis=1, inplace=True)

myseriesdataset = new_seriesdata.values

totrain = myseriesdataset[0:255,:]
tovalid = myseriesdataset[255:, :]

scalerdata = MinMaxScaler(feature_range=(0, 1))

scale_data = scalerdata.fit_transform(myseriesdataset)
x_totrain, y_totrain = [], []

length_of_totrain=len(totrain)

for i in range(60,length_of_totrain):
    x_totrain.append(scale_data[i-60:i,0])
    y_totrain.append(scale_data[i,0])

x_totrain, y_totrain = np.array(x_totrain), np.array(y_totrain)
x_totrain = np.reshape(x_totrain, (x_totrain.shape[0],x_totrain.shape[1],1))

lstm_model = Sequential()
lstm_model.add(LSTM(units=50, return_sequences=True, input_shape=(x_totrain.shape[1],1)))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50, return_sequences=True))
lstm_model.add(Dropout(0.2))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(units=1))

lstm_model.compile(loss='mean_squared_error', optimizer='adadelta')

# Train our model
lstm_model.fit(x_totrain, y_totrain, epochs=10, batch_size=1, verbose=2)

myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values

myinputs = myinputs.reshape(-1,1)
myinputs  = scalerdata.transform(myinputs)

tostore_test_result = []

for i in range(60,myinputs.shape[0]):
    tostore_test_result.append(myinputs[i-60:i,0])

tostore_test_result = np.array(tostore_test_result)
tostore_test_result = np.reshape(tostore_test_result,(tostore_test_result.shape[0],tostore_test_result.shape[1],1))

myclosing_priceresult = lstm_model.predict(tostore_test_result)

myclosing_priceresult = scalerdata.inverse_transform(myclosing_priceresult)

Step 6: Making Predictions with Our Model

To make predictions, we simply need to call our model on new input data.

# Set up train/test split parameters again
totrain = df_train
tovalid = df_test

myinputs = new_seriesdata[len(new_seriesdata) - (len(tovalid)+1) - 60:].values

# Predict next day's stock price using our trained model
print(len(tostore_test_result));
print(myclosing_priceresult);

Conclusion

In this article, we demonstrated how to modify code to fetch data from a custom CSV file instead of relying on Yahoo Finance. We covered the steps necessary for preparing and handling custom CSV data, including normalization, splitting our dataset into training and testing sets, and training our model using LSTM neural networks.


Last modified on 2024-04-22