Building a Handwritten Digit Recognizer with PyTorch
Written on
To read the original article, visit: https://lumos.blog/how-to-train-and-evaluate-a-neural-network-with-pytorch/
Neural networks play a pivotal role in artificial intelligence and machine learning, with digit recognition being one of the most fundamental applications.
This tutorial will guide you through the steps required to build a digit recognizer using a neural network, offering a thorough line-by-line breakdown of the code. We will explore the MNIST dataset, a common benchmark in this domain, which comprises 70,000 images of handwritten digits. By the end of this tutorial, you will have a solid understanding of how to construct, train, and evaluate a neural network that can accurately recognize these digits.
Every part of the code will be explained in detail, ensuring you comprehend both the 'how' and 'why' of each step. Whether you're coding alongside or seeking to enhance your theoretical understanding, this tutorial is designed to clarify the principles of neural networks in a structured way.
Imports
The first step is to import the essential libraries needed for creating our neural network.
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import matplotlib.pyplot as plt
Ensuring Reproducibility
Setting random seeds is a crucial practice in machine learning and neural network training. Random processes like weight initialization, data shuffling, and dropout can introduce variability in results each time a model is trained. By establishing random seeds, you can ensure consistent outcomes, which means that every execution of the code will yield the same results, facilitating exact replication of experiments.
This reproducibility is vital for debugging, comparing model performance, and sharing results. With a fixed random seed, you enhance the reliability and verifiability of your experiments, allowing others to replicate your findings. In Python, setting a random seed is straightforward using libraries like NumPy, TensorFlow, and PyTorch.
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed) # for multi-GPU setups
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Loading and Preparing the Data
Data can be sourced from Kaggle: https://www.kaggle.com/competitions/digit-recognizer/data.
The dataset contains 70,000 grayscale images of handwritten digits, each measuring 28x28 pixels. This dataset is divided into a training set of 42,000 images and a test set of 28,000 images, which will be addressed later.
The data is structured as a CSV file with a label column indicating the digit each image represents, along with 784 columns (28*28 = 784) containing pixel values ranging from 0 (black) to 255 (white).
While it may appear as a spreadsheet, the data is actually a .csv file where each number is separated by commas.
To load this dataset, we will use NumPy to read the CSV file:
dataset = np.loadtxt('data/train.csv', delimiter=',', skiprows=1)
The dataset is loaded from train.csv located in our data directory, skipping the first row that contains the column names ('label', 'pixel0', ..., 'pixel783').
Next, we will split the training set into X and y. X represents the independent variable (the pixel values), while y is the dependent variable (the labels that correspond to the pixel values).
We obtain X (the pixel values) from all columns except the first:
X = dataset[:, 1:]
And y (the labels) from the first column:
y = dataset[:, 0]
It’s also important to normalize our data. The pixel values range from 0 to 255, but it’s beneficial to scale them down to a range between 0 and 1.
Normalization helps prevent large values during internal computations in the neural network, which can complicate the training process.
Normalization is achieved simply by:
X = X / 255.0
Next, we need to convert the data into PyTorch-compatible tensors of appropriate types:
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.int64)
Now that the data is loaded and prepared, we can move on to the exciting part of deep learning: creating our model.
model = nn.Sequential(
nn.Linear(784, 300),
nn.ReLU(),
nn.Linear(300, 300),
nn.ReLU(),
nn.Linear(300, 10)
)
The input layer must contain 784 neurons to correspond to the 784 values in X (one for each pixel). The output layer requires 10 neurons to predict the digits from 0 to 9.
The number of neurons in the hidden layers is somewhat flexible, with some guidelines to consider:
- The number of hidden neurons should be between the size of the input layer and that of the output layer.
- It should be around two-thirds the size of the input layer, plus the size of the output layer.
- The number of hidden neurons should be less than twice the size of the input layer.
Understanding ReLU
Each neuron in the network requires an activation function to determine how much it should “activate” based on its internal calculation of input*weight+bias.
There are various activation functions; ReLU is one such example, illustrated in the graph below.
If the input*weight+bias is less than 0, the neuron outputs 0 (it remains inactive). If it exceeds 0, it outputs the input value (y=x).
Loss Function and Optimizers
A loss function, or error function, quantifies how accurately the algorithm's predictions match the actual target values.
An optimizer adjusts the neural network's attributes, such as weights and learning rates, to minimize loss and enhance accuracy.
Here’s how we define them in code:
loss_fn = nn.CrossEntropyLoss() # binary cross-entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)
We utilize the cross-entropy loss function along with the Adam optimizer. The learning rate controls the amount of adjustment to the model in response to estimated errors during weight updates. While we set it at 0.001, it's common to experiment with values between 0.1 and 10^-6. A learning rate that’s too high may overshoot, while one that’s too low can slow learning.
Training Loop
Next, we implement a training loop for the neural network over 10 epochs with a batch size of 10. Each epoch divides the dataset into smaller batches, performing a forward pass to predict outputs using the model.
We compute the loss between the predicted and actual labels using our predefined Cross Entropy Loss function.
Gradients indicate how much the loss changes concerning each model parameter (weights and biases). They guide the adjustments necessary to minimize loss and enhance predictions during training.
After calculating the loss, we reset the gradients to zero, perform backpropagation to compute gradients of the loss with respect to model parameters, and the Adam optimizer updates these parameters based on the gradients.
# Run the training loop
n_epochs = 10
batch_size = 10
for epoch in range(n_epochs):
for i in range(0, len(X), batch_size):
Xbatch = X[i:i + batch_size]
ybatch = y[i:i + batch_size]
# Forward pass
y_pred = model(Xbatch)
# Compute loss
loss = loss_fn(y_pred, ybatch)
# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f'Finished epoch {epoch}, latest loss {loss.item()}')
Evaluating the Model
To evaluate our neural network model, we first disable gradient calculations to conserve computational resources.
The code will make predictions on input data X and convert these predictions into class labels using the torch.argmax function. This function is necessary because the predictions consist of output values from each neuron, and we only need the one that activated the most to determine the predicted class.
By comparing predicted labels with true labels y, we can calculate the model's accuracy—the proportion of correct predictions.
with torch.no_grad():
y_pred = model(X)
predictions = torch.argmax(y_pred, dim=1)
accuracy = (predictions == y).float().mean()
print(f"Accuracy: {accuracy.item()}")
Making Predictions
Finally, we will use our model to make predictions on the test set.
# Load the test dataset
test_data_path = 'data/test.csv'
test_df = pd.read_csv(test_data_path)
# Prepare the test data
test_X = test_df.values
test_X = test_X / 255.0 # Normalize pixel values to [0, 1]
test_X = torch.tensor(test_X, dtype=torch.float32)
model.eval()
# Make predictions
with torch.no_grad():
outputs = model(test_X)
test_predictions = torch.argmax(outputs, dim=1).numpy()
# Create a DataFrame with ImageId and Label
image_ids = np.arange(1, len(test_predictions) + 1) # Assuming ImageId starts from 1
submission_df = pd.DataFrame({
'ImageId': image_ids,
'Label': test_predictions
})
# Save to CSV file
submission_df.to_csv('submission.csv', index=False)
We load the dataset, normalize the pixel values as we did for X in the training set, switch the model to evaluation mode, and make predictions on the test set. We then create a pandas DataFrame containing the image IDs and predicted labels, which we save to a CSV file.
And that's all for this tutorial! Feel free to leave a comment if you have any questions, and give a thumbs up if you found this content helpful!
By the way, this straightforward neural network achieves an impressive accuracy of 0.975 on the test set.
Full Code: https://github.com/Pursuit-Labs/neural-networks/blob/main/mnist/mnist.ipynb