Introduction
This article provides the development of a 3-layer Neural Network (NN) from scratch (i.e., only using Numpy) for solving the binary MNIST dataset. This project offers a practical guide to the foundational aspects of deep learning and the architecture of neural networks. It primarily concentrates on building the network from the ground up (i.e., the mathematics running under the hood of NNs). It is noted that this project is an extension of a project titled "Code a 2-layer Neural Network from Scratch," where I explained in detail the mathematics behind the senses of a neural network (see this article for more details). In other words, solving the binary MNIST can be considered a from-scratch neural network use case.
Load MNIST dataset
Once the helper files are available in AWS SageMaker, we use pre-defined functions to load the MNIST dataset.
from utils_data import *
download_and_save_MNIST(path="data/")
The purpose of this experiment is to handle the binary MNIST only. Therefore, we need a function to load the binary MNIST of 0 and 1 only (i.e., other MNIST digits from 2 to 9 are out of scope in this example).
We applied those functions to load the binary MNIST dataset.
X_train_org, Y_train_org, X_test_org, Y_test_org = load_mnist()
X_train_org, Y_train_org, X_test_org, Y_test_org = load_binary_mnist(X_train_org, Y_train_org, X_test_org, Y_test_org)
Data visualization
visualize_multi_images(X_train_org, Y_train_org, layout=(3, 3), figsize=(10, 10), fontsize=12)
I store data in an AWS S3 bucket for later use if needed.
key = "data/mnist.npz"
bucket_url = "s3://{}/{}".format(BUCKET_NAME, key)
boto3.Session().resource("s3").Bucket(BUCKET_NAME).Object(key).upload_file("data/mnist.npz")
Data preparation for model training
We applied a helper function to prepare the binary MNIST for training the Neural Network.
X_train, X_test, Y_train, Y_test = make_inputs(X_train_org, X_test_org, Y_train_org, Y_test_org)
Build a Neural Network for solving binary MNIST
To build a Neural Network, we must define helper functions as the building blocks for constructing the architecture. I will not list those functions here because they will make this writing unnecessarily long. I only present the construction of the Neural Network. For more details regarding helper functions and components of nn_Llayers_binary(), please see utils_binary.py in this repository or refer to this article for more details).
Setup the hyperparameters
layer_dims = [784, 128, 64, 1]
learning_rate = 0.01
number_iterations = 250
Train the Neural Network
from utils_binary import *
parameters, costs, time = nn_Llayers_binary(X_train, Y_train, layer_dims, learning_rate, number_iterations, print_cost=False)
Compute accuracy on train and test datasets
Yhat_train = predict_binary(X_train, Y_train, parameters)
train_accuracy = compute_accuracy(Yhat_train, Y_train)
Yhat_test = predict_binary(X_test, Y_test, parameters)
test_accuracy = compute_accuracy(Yhat_test, Y_test)
print(f"Train accuracy: {train_accuracy} %")
print(f"Test accuracy: {test_accuracy} %")
The accuracy output
Train accuracy: 99.65 %
Test accuracy: 99.81 %
Given that the MNIST dataset is not difficult, using only binary MNIST to distinguish between 0 and 1 makes this task even simpler. Therefore, it is no surprise to see such high accuracy on both the train and test datasets, even though the solution presented in this experiment is not an advanced neural network.
There are some visualizations of the misclassified cases.
Summary
This repository could make a great introductory project for those new to artificial intelligence, machine learning, and deep learning. Experimenting with this simple neural network taught me many basic principles that operate behind the scenes.