Recurrent Neural Network in PyTorch

Recurrent Neural Networks are a type of neural networks that are designed to work on sequence prediction models. RNNs can be used for text data, speech data, classification problems and generative models. Unlike ANNs, RNNs' prediction are based on the past prediction as well as the current input. RNNs are networks with loops in them allowing information to persist.

Each node of an RNN consists of 2 inputs:

  1. Memory unit
  2. Event unit

M(t-1) is the memory unit or the output of the previous prediction. E(t) is the current event or the information being provided at the present time. M(t) is the output of the current node or the output at the present time in the sequence.

1. Packages

back to top

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
%matplotlib inline

2. Data definition

back to top

In this notebook, I'm going to train a very simple LSTM model, which is a type of RNN architecture to do time series prediction. Given some input data, it should be able to generate a prediction for the next step. I'll be using a Sin wave as an example as it's very easy to visualiase the behaviour of a sin wave.

2.1 Declaring a tensor x

x = torch.linspace(0,799,800)

2.2 Creating a tensor y as a sin function of x

y = torch.sin(x*2*3.1416/40)

2.3 Plotting y

plt.figure(figsize=(12,4))
plt.xlim(-10,801)
plt.grid(True)
plt.xlabel("x")
plt.ylabel("sin")
plt.title("Sin plot")
plt.plot(y.numpy(),color='#8000ff')
plt.show()

3. Batching the data

back to top

3.1 Splitting the data in train/test set

test_size = 40
train_set = y[:-test_size]
test_set = y[-test_size:]
3.1.1 Plotting the training/testing set
plt.figure(figsize=(12,4))
plt.xlim(-10,801)
plt.grid(True)
plt.xlabel("x")
plt.ylabel("sin")
plt.title("Sin plot")
plt.plot(train_set.numpy(),color='#8000ff')
plt.plot(range(760,800),test_set.numpy(),color="#ff8000")
plt.show()

3.2 Creating the batches of data

While working with LSTM models, we divide the training sequence into series of overlapping windows. The label used for comparison is the next value in the sequence.

For example if we have series of of 12 records and a window size of 3, we feed [x1, x2, x3] into the model, and compare the prediction to x4. Then we backdrop, update parameters, and feed [x2, x3, x4] into the model and compare the prediction to x5. To ease this process, I'm defining a function input_data(seq,ws) that created a list of (seq,labels) tuples. If ws is the window size, then the total number of (seq,labels) tuples will be len(series)-ws.

def input_data(seq,ws):
    out = []
    L = len(seq)
    
    for i in range(L-ws):
        window = seq[i:i+ws]
        label = seq[i+ws:i+ws+1]
        out.append((window,label))
    
    return out
3.2.1 Calling the input_data function

The length of x = 800

The length of train_set = 800 - 40 = 760

The length of train_data = 760 - 40 - 720

window_size = 40
train_data = input_data(train_set, window_size)
len(train_data)
720
3.2.2 Checking the 1st value from train_data
train_data[0]
(tensor([ 0.0000e+00,  1.5643e-01,  3.0902e-01,  4.5399e-01,  5.8779e-01,
          7.0711e-01,  8.0902e-01,  8.9101e-01,  9.5106e-01,  9.8769e-01,
          1.0000e+00,  9.8769e-01,  9.5106e-01,  8.9100e-01,  8.0901e-01,
          7.0710e-01,  5.8778e-01,  4.5398e-01,  3.0901e-01,  1.5643e-01,
         -7.2400e-06, -1.5644e-01, -3.0902e-01, -4.5400e-01, -5.8779e-01,
         -7.0711e-01, -8.0902e-01, -8.9101e-01, -9.5106e-01, -9.8769e-01,
         -1.0000e+00, -9.8769e-01, -9.5105e-01, -8.9100e-01, -8.0901e-01,
         -7.0710e-01, -5.8777e-01, -4.5398e-01, -3.0900e-01, -1.5642e-01]),
 tensor([1.4480e-05]))

4. Defining the model

back to top

4.1 Model Class

class LSTM(nn.Module):
    
    def __init__(self,input_size = 1, hidden_size = 50, out_size = 1):
        super().__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size,out_size)
        self.hidden = (torch.zeros(1,1,hidden_size),torch.zeros(1,1,hidden_size))
    
    def forward(self,seq):
        lstm_out, self.hidden = self.lstm(seq.view(len(seq),1,-1), self.hidden)
        pred = self.linear(lstm_out.view(len(seq),-1))
        return pred[-1]

4.2 Model Instantiation

torch.manual_seed(42)
model = LSTM()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
4.2.1 Printing the model
model
LSTM(
  (lstm): LSTM(1, 50)
  (linear): Linear(in_features=50, out_features=1, bias=True)
)

4.3 Training

During training, I'm visualising the prediction process for the test data on the go. It will give a better understanding of how the training is being carried out in each epoch. The training sequence is represented in purple while the predicted sequence in represented in orange.

epochs = 10
future = 40

for i in range(epochs):
    
    for seq, y_train in train_data:
        optimizer.zero_grad()
        model.hidden = (torch.zeros(1,1,model.hidden_size),
                       torch.zeros(1,1,model.hidden_size))
        
        y_pred = model(seq)
        loss = criterion(y_pred, y_train)
        loss.backward()
        optimizer.step()
        
    print(f"Epoch {i} Loss: {loss.item()}")
    
    preds = train_set[-window_size:].tolist()
    for f in range(future):
        seq = torch.FloatTensor(preds[-window_size:])
        with torch.no_grad():
            model.hidden = (torch.zeros(1,1,model.hidden_size),
                           torch.zeros(1,1,model.hidden_size))
            preds.append(model(seq).item())
        
    loss = criterion(torch.tensor(preds[-window_size:]), y[760:])
    print(f"Performance on test range: {loss}")
    
    plt.figure(figsize=(12,4))
    plt.xlim(700,801)
    plt.grid(True)
    plt.plot(y.numpy(),color='#8000ff')
    plt.plot(range(760,800),preds[window_size:],color='#ff8000')
    plt.show()
Epoch 0 Loss: 0.09212876856327057
Performance on test range: 0.6071590185165405
Epoch 1 Loss: 0.06506767123937607
Performance on test range: 0.5650987029075623
Epoch 2 Loss: 0.041980478912591934
Performance on test range: 0.5199716687202454
Epoch 3 Loss: 0.017842764034867287
Performance on test range: 0.42209967970848083
Epoch 4 Loss: 0.0028870997484773397
Performance on test range: 0.16624125838279724
Epoch 5 Loss: 0.00032007633126340806
Performance on test range: 0.030554424971342087
Epoch 6 Loss: 0.00012969240196980536
Performance on test range: 0.014990185387432575
Epoch 7 Loss: 0.00012006766337435693
Performance on test range: 0.01185668632388115
Epoch 8 Loss: 0.0001265572354895994
Performance on test range: 0.010163827799260616
Epoch 9 Loss: 0.0001319547591265291
Performance on test range: 0.008897590450942516

5. Alcohol Sales dataset

back to top

5.1 Loading and plotting

5.1.1 Importing the data
df = pd.read_csv("/kaggle/input/for-simple-exercises-time-series-forecasting/Alcohol_Sales.csv", index_col = 0, parse_dates = True)
df.head()
S4248SM144NCEN
DATE
1992-01-01 3459
1992-02-01 3458
1992-03-01 4002
1992-04-01 4564
1992-05-01 4221
5.1.2 Dropping the empty rows
df.dropna(inplace=True)
len(df)
325
5.1.3 Plotting the Time Series Data
plt.figure(figsize = (12,4))
plt.title('Alcohol Sales')
plt.ylabel('Sales in million dollars')
plt.grid(True)
plt.autoscale(axis='x',tight=True)
plt.plot(df['S4248SM144NCEN'],color='#8000ff')
plt.show()

5.2 Prepare and normalize

5.2.1 Preparing the data
y = df['S4248SM144NCEN'].values.astype(float) 

#defining a test size
test_size = 12

#create train and test splits
train_set = y[:-test_size]
test_set = y[-test_size:]
test_set
array([10415., 12683., 11919., 14138., 14583., 12640., 14257., 12396.,
       13914., 14174., 15504., 10718.])
5.2.2 Normalize the data
from sklearn.preprocessing import MinMaxScaler

# instantiate a scaler
scaler = MinMaxScaler(feature_range=(-1, 1))

# normalize the training set
train_norm = scaler.fit_transform(train_set.reshape(-1, 1))
5.2.3 Prepare data for LSTM model
train_norm = torch.FloatTensor(train_norm).view(-1)

# define a window size
window_size = 12
# define a function to create sequence/label tuples
def input_data(seq,ws):
    out = []
    L = len(seq)
    for i in range(L-ws):
        window = seq[i:i+ws]
        label = seq[i+ws:i+ws+1]
        out.append((window,label))
    return out

# apply input_data to train_norm
train_data = input_data(train_norm, window_size)
len(train_data)
301
5.2.4 Printing the first tuple
train_data[0]
(tensor([-0.9268, -0.9270, -0.8340, -0.7379, -0.7966, -0.7439, -0.7547, -0.8109,
         -0.8128, -0.7901, -0.7933, -0.6743]),
 tensor([-1.]))

5.3 Modelling

5.3.1 Model definition
class LSTMnetwork(nn.Module):
    def __init__(self,input_size=1,hidden_size=100,output_size=1):
        super().__init__()
        self.hidden_size = hidden_size
        
        # add an LSTM layer:
        self.lstm = nn.LSTM(input_size,hidden_size)
        
        # add a fully-connected layer:
        self.linear = nn.Linear(hidden_size,output_size)
        
        # initializing h0 and c0:
        self.hidden = (torch.zeros(1,1,self.hidden_size),
                       torch.zeros(1,1,self.hidden_size))

    def forward(self,seq):
        lstm_out, self.hidden = self.lstm(
            seq.view(len(seq),1,-1), self.hidden)
        pred = self.linear(lstm_out.view(len(seq),-1))
        return pred[-1]
5.3.3 Instantiation, loss and optimizer
torch.manual_seed(42)

# instantiate
model = LSTMnetwork()

# loss
criterion = nn.MSELoss()

#optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

model
LSTMnetwork(
  (lstm): LSTM(1, 100)
  (linear): Linear(in_features=100, out_features=1, bias=True)
)
5.3.4 Training
epochs = 100

import time
start_time = time.time()

for epoch in range(epochs):
    for seq, y_train in train_data:
        optimizer.zero_grad()
        model.hidden = (torch.zeros(1,1,model.hidden_size),
                        torch.zeros(1,1,model.hidden_size))
        
        y_pred = model(seq)
        
        loss = criterion(y_pred, y_train)
        loss.backward()
        optimizer.step()
        
    print(f'Epoch: {epoch+1:2} Loss: {loss.item():10.8f}')
    
print(f'\nDuration: {time.time() - start_time:.0f} seconds')
Epoch:  1 Loss: 0.26208422
Epoch:  2 Loss: 0.34071690
Epoch:  3 Loss: 0.37325260
Epoch:  4 Loss: 0.37144578
Epoch:  5 Loss: 0.32069460
Epoch:  6 Loss: 0.33051443
Epoch:  7 Loss: 0.32643345
Epoch:  8 Loss: 0.32497045
Epoch:  9 Loss: 0.30683771
Epoch: 10 Loss: 0.30106238
Epoch: 11 Loss: 0.29293439
Epoch: 12 Loss: 0.28600523
Epoch: 13 Loss: 0.29466769
Epoch: 14 Loss: 0.27529088
Epoch: 15 Loss: 0.29106244
Epoch: 16 Loss: 0.27637732
Epoch: 17 Loss: 0.28570727
Epoch: 18 Loss: 0.28755152
Epoch: 19 Loss: 0.26827139
Epoch: 20 Loss: 0.26684177
Epoch: 21 Loss: 0.26333189
Epoch: 22 Loss: 0.26537696
Epoch: 23 Loss: 0.25447667
Epoch: 24 Loss: 0.27008417
Epoch: 25 Loss: 0.13406612
Epoch: 26 Loss: 0.00002756
Epoch: 27 Loss: 0.00112184
Epoch: 28 Loss: 0.00000143
Epoch: 29 Loss: 0.00436061
Epoch: 30 Loss: 0.00136413
Epoch: 31 Loss: 0.00003336
Epoch: 32 Loss: 0.00000254
Epoch: 33 Loss: 0.00069296
Epoch: 34 Loss: 0.00001927
Epoch: 35 Loss: 0.00013140
Epoch: 36 Loss: 0.00446211
Epoch: 37 Loss: 0.01253245
Epoch: 38 Loss: 0.00130676
Epoch: 39 Loss: 0.00263820
Epoch: 40 Loss: 0.00000005
Epoch: 41 Loss: 0.00010856
Epoch: 42 Loss: 0.00023660
Epoch: 43 Loss: 0.00086661
Epoch: 44 Loss: 0.00222667
Epoch: 45 Loss: 0.00144608
Epoch: 46 Loss: 0.00181283
Epoch: 47 Loss: 0.00194901
Epoch: 48 Loss: 0.00222321
Epoch: 49 Loss: 0.00171158
Epoch: 50 Loss: 0.00116998
Epoch: 51 Loss: 0.00053735
Epoch: 52 Loss: 0.00003865
Epoch: 53 Loss: 0.00000508
Epoch: 54 Loss: 0.00014273
Epoch: 55 Loss: 0.00049832
Epoch: 56 Loss: 0.00043864
Epoch: 57 Loss: 0.00007255
Epoch: 58 Loss: 0.00000097
Epoch: 59 Loss: 0.00002359
Epoch: 60 Loss: 0.00154067
Epoch: 61 Loss: 0.01172664
Epoch: 62 Loss: 0.00040826
Epoch: 63 Loss: 0.04736882
Epoch: 64 Loss: 0.00010700
Epoch: 65 Loss: 0.00469892
Epoch: 66 Loss: 0.00403236
Epoch: 67 Loss: 0.00016044
Epoch: 68 Loss: 0.00030570
Epoch: 69 Loss: 0.00002520
Epoch: 70 Loss: 0.00102512
Epoch: 71 Loss: 0.00016307
Epoch: 72 Loss: 0.00211082
Epoch: 73 Loss: 0.00022014
Epoch: 74 Loss: 0.00929940
Epoch: 75 Loss: 0.00371494
Epoch: 76 Loss: 0.00909352
Epoch: 77 Loss: 0.00813726
Epoch: 78 Loss: 0.00096239
Epoch: 79 Loss: 0.00042595
Epoch: 80 Loss: 0.00421279
Epoch: 81 Loss: 0.00527355
Epoch: 82 Loss: 0.00404466
Epoch: 83 Loss: 0.00226170
Epoch: 84 Loss: 0.01248677
Epoch: 85 Loss: 0.00621898
Epoch: 86 Loss: 0.00300212
Epoch: 87 Loss: 0.00150585
Epoch: 88 Loss: 0.00027605
Epoch: 89 Loss: 0.00000130
Epoch: 90 Loss: 0.00062003
Epoch: 91 Loss: 0.00306905
Epoch: 92 Loss: 0.00021189
Epoch: 93 Loss: 0.00022184
Epoch: 94 Loss: 0.00034907
Epoch: 95 Loss: 0.00036093
Epoch: 96 Loss: 0.00428128
Epoch: 97 Loss: 0.00093603
Epoch: 98 Loss: 0.00117783
Epoch: 99 Loss: 0.00007189
Epoch: 100 Loss: 0.00586253

Duration: 156 seconds

5.4 Predictions

5.4.1 Test set predictions
future = 12

preds = train_norm[-window_size:].tolist()

model.eval()

for i in range(future):
    seq = torch.FloatTensor(preds[-window_size:])
    with torch.no_grad():
        model.hidden = (torch.zeros(1,1,model.hidden_size),
                        torch.zeros(1,1,model.hidden_size))
        preds.append(model(seq).item())
preds[window_size:]
[0.254626989364624,
 0.5324987769126892,
 0.5401761531829834,
 0.7740813493728638,
 1.0032850503921509,
 0.4382571876049042,
 0.5605596303939819,
 0.5859595537185669,
 0.6089707612991333,
 0.7689874172210693,
 0.9936566352844238,
 -0.0027425289154052734]
5.4.2 Original test set
df['S4248SM144NCEN'][-12:]
DATE
2018-02-01    10415
2018-03-01    12683
2018-04-01    11919
2018-05-01    14138
2018-06-01    14583
2018-07-01    12640
2018-08-01    14257
2018-09-01    12396
2018-10-01    13914
2018-11-01    14174
2018-12-01    15504
2019-01-01    10718
Name: S4248SM144NCEN, dtype: int64
5.4.3 Inverting the normalised values
true_predictions = scaler.inverse_transform(np.array(preds[window_size:]).reshape(-1, 1))
true_predictions
array([[10369.94057429],
       [11995.35159555],
       [12040.26040804],
       [13408.48885316],
       [14749.21590227],
       [11444.08541889],
       [12159.49355799],
       [12308.07040948],
       [12442.67446822],
       [13378.69189703],
       [14692.8944881 ],
       [ 8864.45757711]])
5.4.4 Plotting
x = np.arange('2018-02-01', '2019-02-01', dtype='datetime64[M]').astype('datetime64[D]')
plt.figure(figsize=(12,4))
plt.title('Alcohol Sales')
plt.ylabel('Sales in million dollars')
plt.grid(True)
plt.autoscale(axis='x',tight=True)
plt.plot(df['S4248SM144NCEN'], color='#8000ff')
plt.plot(x,true_predictions, color='#ff8000')
plt.show()
5.5.5 Zooming the test predictions
fig = plt.figure(figsize=(12,4))
plt.title('Alcohol Sales')
plt.ylabel('Sales in million dollars')
plt.grid(True)
plt.autoscale(axis='x',tight=True)
fig.autofmt_xdate()

plt.plot(df['S4248SM144NCEN']['2017-01-01':], color='#8000ff')
plt.plot(x,true_predictions, color='#ff8000')
plt.show()

If you liked the notebook, consider giving an upvote.

back to top