Limitation of Using Simple Deep Neural Network for Sinusoid Function Regression
Linear function regression is an easy piece of problem for Deep Neural Network (Deep Learning) to solve. Non-linear function may be a bit problematic for Deep Learning to solve. This post meant to show you the implement of non-linear function regression using deep learning approach.
Few days ago, I stumbled upon some sinusoid function. I need to make a machine model using the data were provided. Normally, I used a non-parametric regression function, like K-means, ensemble method, or maybe meta-learning approach. That day I was thinking “Can a simple Deep Learning Algorithm make regression on Sinusoid Function?” the answer is: “Its a bit tricky”. In general, the function may works well within the data range, but beyond the data range the prediction is very bad. Perhaps we need to implement much more advance model, lika a LSTM neural network.
In order for us to understand, we need to make a dummy data set that inherit a sinusoid form.
i= 100
# i is parameter to determine number of wave present within range of data.
# 200 = 1 wavelength, 400 = 2 wavelength, etc..
X = np.random.randint(-10000,10000, size = n).reshape(-1,1)
y = (np.sin(np.radians(X/(n))*i).flatten() + np.random.normal(0,0.1,size=n)).reshape(-1,1)
plt.scatter(X, y, s=2, alpha=0.1)
plt.xticks([],[])
plt.yticks([],[])
with this function, we will get pairs of X and y that have half wave of a sinusoid function. Lets make a simple Simple Neural Network model using this model, and see how it work.
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
model = keras.Sequential()
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mean_squared_error', )
model.fit(X_scaled, y, batch_size=100, epochs=20, verbose=0)
X_pred = np.linspace(-15000, 15000, 1000).reshape(-1, 1)
X_pred_sc = scaler.transform(X_pred)
y_pred = model.predict(X_pred_sc)
plt.figure()
plt.title(f'i = {i}')
plt.scatter(X, y, s=2, label='data')
plt.scatter(X_pred, y_pred, s=4, label='prediction')
plt.legend()
plt.xticks([],[])
plt.yticks([],[])
Note there are some simplification on building the algorithm, there are some simplification that implemented for building the algorithms:
- This model using 8 dense layer with different number of nodes in each layer
- Each layer have same activation function, that is ‘relu’
- Number of batch size = 100
- Number of epochs = 20
- Optimizer being used in this model is ‘rmrprop’
- Loss function being used in this model is ‘mean_squared_error’
with previously mentioned parameter, the result as these following image.
The result is quite promising! However the data only showing half wavelength. What is we increase the wavelength bigger?
i = 2000
n = 10000
X = np.random.randint(-10000,10000, size = n).reshape(-1,1)
y = (np.sin(np.radians(X/(n))*i).flatten() + np.random.normal(0,0.1,size=n)).reshape(-1,1)
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
model = keras.Sequential()
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mean_squared_error', )
model.fit(X_scaled, y, batch_size=100, epochs=100, verbose=0)
X_pred = np.linspace(-15000, 15000, 1000).reshape(-1, 1)
X_pred_sc = scaler.transform(X_pred)
y_pred = model.predict(X_pred_sc)
plt.figure()
plt.title(f'i = {i}')
plt.scatter(X, y, s=2, label='data')
plt.scatter(X_pred, y_pred, s=4, label='prediction')
plt.legend()
plt.xticks([],[])
plt.yticks([],[])
As we can see that the prediction is not very good with data that formed by many sinusoid wave form. Lets see where the changes happen by brute forcing training using different data scenarios.
plt.figure(figsize=(20,20))
for num, i in enumerate([20, 50, 100, 200, 500, 600, 700, 800, 900, 1000, 2000, 4000]):
tf.random.set_seed(42)
np.random.seed(42)
n = 10000
X = np.random.randint(-10000,10000, size = n).reshape(-1,1)
y = (np.sin(np.radians(X/(n))*i).flatten() + np.random.normal(0,0.1,size=n)).reshape(-1,1)
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
model = keras.Sequential()
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mean_squared_error', )
model.fit(X_scaled, y, batch_size=100, epochs=100, verbose=0)
X_pred = np.linspace(-15000, 15000, 1000).reshape(-1, 1)
X_pred_sc = scaler.transform(X_pred)
y_pred = model.predict(X_pred_sc)
plt.subplot(3,4,num+1)
plt.title(f'i = {i}')
plt.scatter(X, y, s=2, label='data')
plt.scatter(X_pred, y_pred, s=4, label='prediction')
plt.legend()
plt.xticks([],[])
plt.yticks([],[])
The prediction is somewhat getting worst as the number of wave increases with the first anomalous result occur in i=800. Can we improve the model? lets try to increase the epoch and layer. Lets change these following parameters:
- Increase number of layers from 8 to 10
- Increase number of epoch from 100 to 200
plt.figure(figsize=(20,20))
for num, i in enumerate([20, 50, 100, 200, 500, 600, 700, 800, 900, 1000, 2000, 4000]):
tf.random.set_seed(42)
np.random.seed(42)
n = 10000
X = np.random.randint(-10000,10000, size = n).reshape(-1,1)
y = (np.sin(np.radians(X/(n))*i).flatten() + np.random.normal(0,0.1,size=n)).reshape(-1,1)
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
model = keras.Sequential()
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(128, activation='relu',use_bias = True))
model.add(keras.layers.Dense(128, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mean_squared_error', )
model.fit(X_scaled, y, batch_size=100, epochs=300, verbose=0)
X_pred = np.linspace(-15000, 15000, 1000).reshape(-1, 1)
X_pred_sc = scaler.transform(X_pred)
y_pred = model.predict(X_pred_sc)
plt.subplot(3,4,num+1)
plt.title(f'i = {i}')
plt.scatter(X, y, s=2, label='data')
plt.scatter(X_pred, y_pred, s=4, label='prediction')
plt.legend()
plt.xticks([],[])
plt.yticks([],[])
According to the experiment, the result is not getting better after the algorithm and training parameter has been altered. Therefore adding more layers and altering epoch may not improve prediction accuracy.
Brute Force Regression for Sensitivity Analsyis
In order to understand just how far can a simple neural network make prediction in sinusoid data, I conduct a brute force work: regression analysis for the data that consist of 0.1 wave length to 50 wave length.
score = []
for num, i in enumerate(np.arange(50, 10000, 100)):
print(i)
tf.random.set_seed(42)
np.random.seed(42)
n = 10000
X = np.random.randint(-10000,10000, size = n).reshape(-1,1)
y = (np.sin(np.radians(X/(n))*i).flatten() + np.random.normal(0,0.1,size=n)).reshape(-1,1)
scaler = MinMaxScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
model = keras.Sequential()
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(64, activation='relu',use_bias = True))
model.add(keras.layers.Dense(32, activation='relu',use_bias = True))
model.add(keras.layers.Dense(16, activation='relu',use_bias = True))
model.add(keras.layers.Dense(8, activation='relu',use_bias = True))
model.add(keras.layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mean_squared_error', )
model.fit(X_scaled, y, batch_size=100, epochs=100, verbose=0)
hist = model.history.history['loss'][0]
X_pred = np.linspace(-15000, 15000, 1000).reshape(-1, 1)
X_pred_sc = scaler.transform(X_pred)
y_pred = model.predict(X_pred_sc)
score.append(hist)
plt.plot(np.arange(50, 10000, 100)/200,score)
plt.ylabel('loss')
plt.xlabel('number of waves')
This experiment shows that the model work best for the data that consist less than 5 wave length.
Conclusion
In conclusion, simple deep neural network may not the best solution for sinusoids function prediction. I think algorithm that can record temporal memory, like RNN and LSMT are more suitable for these kind of function.