The rise of machine learning has been accompanied by a similar rise of tools and libraries to aid machine learning applications. From basic model training with scikit-learn to deep learning frameworks such as PyTorch and Tensorflow, these tools have come a long way in terms of advancing machine learning development.

But as someone starting out with machine learning, how does one make the jump from scikit-learn to PyTorch or Tensorflow? And how do we use these frameworks to create neural networks, one of the most powerufl models in machine learning today?

What is a Neural Network?

A neural network is a different way of training a model. Rather than using an algorithm such as k-nearest neighbors to make predictions, a neural network takes in input(s), assigns weights to each input parameter, and performs mathematical calculations using these weights (along with several bias constants) to arrive at a prediction. The network will then adjust the weights based on the correctness of its predictions.

Simple neural networks generally consist of three layers, each of which contain nodes, or neurons. Input flows in one direction through the layers to the output. Specifically, these kinds of networks are called multi-layer perceptrons.

Input Layer

As the name suggests, each node in this layer represents an input parameter such as an individual pixel in an image.

Hidden Layers

These layers are where the mathematical calculations are performed. Given the input layer, the nodes in the hidden layer multiply the inputs by their weights and add any biases that provide additional weighting to an input to adjust its predictions. In short, a node essentially performs the following linear calculation:

\begin{gather*} y = wx + b \end{gather*}

$y$ can then get fed to a node in another hidden layer where it will be processed again or directly to the output layer.

It's possible to transform $y$ using what's called an activation function. These functions transform it into a non-linear function, which is extremely helpful for allowing the model to fit data points that are non-linear. One example is the Rectified Linear Unit (ReLU) function, which simply takes the maximum of $y$ and 0 and returns that as the node's output:

\begin{gather*} y = \max(0, wx + b) \end{gather*}

Output Layer

The output layer represents the final output, or the prediction the neural network makes given some input data. This can range from probabilities that an input data point is of a certain class to regression predictions.

How a Neural Network Trains Itself

When creating a neural network, you specify a variety of different parameters, ranging from optimizers to loss functions. I'll be going over these in more detail in my next article, but for now, the most important concept to know of is gradient descent.

A model trains itself by evaluating its performance on a training dataset, calculating a loss function and adjusting the model's weights in response. A developer can choose the exact loss function to use, such as mean absolute error, as well as the way the weights are adjusted. One of the most common ways that weights are adjusted is through backpropagation, which uses gradient descent to fine-tune the model's weights so that it eventually reaches a local minimum of the loss function.

Note: A local minimum is not necessarily equivalent to a global minimum. This means that the best weights for your model might not be found even when using backpropagation!

Understanding What's Available To You

Now that we understand how simple neural networks work, let's view what options are available to build our own!

In training a model, you can split your dataset into training and testing datasets. However, you can further split your dataset into training, validation, and testing datasets, where the validation dataset acts as a sanity check for your model before actually testing itself on the testing dataset.

`scikit-learn`

from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(
    hidden_layer_sizes=(5, 5, 5),
    activation='relu',
    max_iter=100,
    validation_fraction=0.2,
    early_stopping=True
)

mlp.fit(X_train, y_train)
mlp.predict(X_test)

This is pretty simple! In this case, mlp uses a multi-layer perceptron as a classifier. As you can see, you can customize a variety of parameters including the number of nodes in your layers, the activation function, the optimizer, and the maximum number of training iterations by the neural network.

Side Note: sgd stands for Stochastic Gradient Descent, which is more efficient in terms of training than standard gradient descent in that it uses smaller batches in determining how to adjust weights rather than the whole dataset.

PyTorch

import torch
from torch import nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.Softmax(
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to("cuda")
X = some 28 by 28 tensor representing input data

# We still need to transform the model's output into something that makes sense!
# Right now, it's just the output of a bunch of linear calculations and activation functions.
# We can use the softmax activation function to convert our output into a prediction.
# Specifically, this returns the index of the class that has the maximum probability.
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(dim=1)

This is slightly more complicated, but much more customizable than scikit-learn. For each layer, you can define the number of inputs and outputs and you can also define activation functions within each one.

The snippet above also introduces something called a tensor. A tensor is essentially an optimized version of a matrix (of any size) for machine learning purposes.

Tensorflow

from tensorflow import keras
from tensorflow.keras import layers

input_shape = [X_train.shape[1]]

model = keras.Sequential([
    layers.Dense(units=512, activation='relu', input_shape=input_shape),
    layers.Dense(units=512, activation='relu'),
    layers.Dense(units=512, activation='relu'),
    layers.Dense(units=1)
])

model.compile(loss="mae")
model.fit(
    X_train, y_train,
    validation_data=(X_valid, y_valid),
    batch_size=64,
    epochs=100
)

Finally, this is an example of a neural network in Tensorflow. It's quite similar to the PyTorch, with a few minor differences. Ultimately, the library you choose is up to you!

Optimizing Your Neural Networks

As you can see, there are a lot of options to configure when building your neural networks. I'll be going over these in my next article! See you in the next one!

Moving forward, we'll stick with PyTorch.

Important: Taking Advantage of Your GPUs

Machine learning training obviously requires a lot of intensive computation. With GPUs being much more efficient at mathematical calculations than CPUs, the more advanced machine learning libraries cater to GPUs by allowing you to write code that takes advantage of them during training.

Unfortunately, scikit-learn does not support using GPUs to train models.

import tensorflow as tf
with tf.device('/device:GPU:0'):
    # GPU code here
    ...

import torch

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

An Introduction to Neural Networks