zgtangqian.com

Exploring Variational Autoencoders: Generate New Images with Ease

Written on

Introduction

This article delves into Variational Autoencoders (VAE), which belong to the broader category of Deep Generative Models, alongside the well-known GANs (Generative Adversarial Networks).

In contrast to GANs, VAEs employ an Autoencoder framework rather than utilizing a dual Generator-Discriminator setup. Consequently, the concepts underlying VAEs should be relatively easy to grasp, especially for those familiar with Autoencoders.

Feel free to subscribe for email alerts to stay updated on my forthcoming articles regarding Neural Networks, including topics like GANs.

Table of Contents

  • The role of VAEs in the realm of Machine Learning algorithms
  • An exploration of VAE architecture and functionality
  • A detailed Python example illustrating how to construct a VAE using Keras/TensorFlow

The Role of VAEs in Machine Learning Algorithms

The chart below aims to categorize the most prevalent Machine Learning algorithms. This task is challenging due to the multiple dimensions along which we can classify them based on their foundational structures or the specific problems they address.

I have attempted to incorporate both perspectives, leading to the categorization of Neural Networks into a distinct group. Although Neural Networks are predominantly applied in a Supervised manner, it's important to recognize that certain instances, such as Autoencoders, lean towards Unsupervised/Self-Supervised approaches.

Despite Variational Autoencoders (VAE) sharing similar goals with GANs, their structural design aligns more closely with other Autoencoder types like Undercomplete Autoencoders. Thus, you can find VAEs in the Autoencoders section of the interactive chart below.

VAE Architecture and Functionality

Let's begin with an analysis of a standard Undercomplete Autoencoder (AE) before we examine the unique features that differentiate VAEs.

Undercomplete AE

Below is a depiction of a typical AE.

The primary objective of an Undercomplete AE is to effectively encode information from the input data into a lower-dimensional latent space (bottleneck). This is achieved by ensuring the inputs can be reconstructed with minimal loss through a decoder.

It's important to note that during training, the same data set is fed into both the input and output layers as we strive to determine the optimal parameter values for the latent space.

Variational AE

Now, let's investigate how VAEs differ from Undercomplete AEs by examining their architecture:

In VAEs, the latent space comprises distributions rather than discrete point vectors. The inputs are mapped to a Normal distribution, where Z? and Z? represent the mean and variance, respectively, which are learned during model training.

The latent vector Z is sampled from this distribution, utilizing the mean Z? and variance Z?, and is then passed to the decoder to generate the predicted outputs.

Notably, the latent space of a VAE is continuous by design, allowing us to sample from any location within it to produce new outputs (e.g., new images), thus establishing VAE as a generative model.

Regularization Necessity

Encoding inputs into a distribution only partially prepares us for crafting a latent space capable of generating “meaningful” outputs.

To attain the desired regularity, we can introduce a regularization term in the form of the Kullback-Leibler divergence (KL divergence), which will be discussed in further detail in the Python section.

Understanding Latent Space

We can visualize how information is distributed within the latent space with the following illustration.

Mapping data as individual points does not train the model to comprehend the similarities or differences among those points. Therefore, such a space is ineffective for generating new “meaningful” data.

In contrast, Variational Autoencoders map data as distributions and regularize the latent space, which creates a “gradient” or “smooth transition” between distributions. Consequently, sampling a point from this latent space results in new data that closely resembles the training data.

Complete Python Example: Building a VAE with Keras/TensorFlow

Now, we are ready to construct our own VAE!

Setup

We will require the following data and libraries:

  • MNIST handwritten digit dataset (copyright held by Yann LeCun and Corinna Cortes under the Creative Commons Attribution-Share Alike 3.0 license; data source: The MNIST Database)
  • NumPy for data manipulation
  • Matplotlib, Graphviz, and Plotly for visualizations
  • TensorFlow/Keras for Neural Networks

Let's import the necessary libraries:

The above code displays the package versions utilized in this example:

TensorFlow/Keras: 2.7.0 numpy: 1.21.4 matplotlib: 3.5.1 graphviz: 0.19.1 plotly: 5.4.0

Next, we will load the MNIST handwritten digit dataset and showcase the first ten digits. Note that we will only utilize digit labels (y_train, y_test) for visualization, not for model training.

As demonstrated, we have 60,000 images in the training set and 10,000 in the test set, each with dimensions of 28 x 28 pixels.

The final setup step involves flattening the images by reshaping them from 28x28 to 784.

Typically, Convolutional layers would be preferred over flattening, particularly for larger images. However, for simplicity, this example will use Dense layers with flat data instead of Convolutional layers.

New shape of X_train: (60000, 784) New shape of X_test: (10000, 784)

Constructing the Variational Autoencoder Model

We will initiate by defining a function that facilitates sampling from the latent space distribution Z.

Here, we employ a reparameterization trick that allows the loss to backpropagate through the mean (z-mean) and variance (z-log-sigma) nodes since they are deterministic.

Simultaneously, we isolate the sampling node by introducing a non-deterministic parameter, epsilon, sampled from a standard Normal distribution.

Now, let's define the structure of the Encoder model.

The code above creates an encoder model and outputs its structural diagram.

Notice that we direct the same outputs from the Encoder-Hidden-Layer-3 into both Z-Mean and Z-Log-Sigma before recombining them within a custom Lambda layer (Z-Sampling-Layer), which is responsible for sampling from the latent space.

Next, we will develop the Decoder model:

The code above creates a decoder model and outputs its structural diagram.

As illustrated, the decoder is a straightforward model that processes inputs from the latent space through a few hidden layers before generating outputs for the 784 nodes.

Next, we will combine the Encoder and Decoder models to form a Variational Autoencoder model (VAE).

If you observed the latent space layers in the Encoder model closely, you would have seen that the encoder generates three outputs: Z-mean [0], Z-log-sigma [1], and Z [2].

The code above connects the models by specifying that the Encoder receives inputs labeled “visible”. Out of the three outputs from the Encoder [0], [1], [2], we pass the third one (Z [2]) into a Decoder, which produces the outputs labeled “outpt”.

Custom Loss Function

Before training the VAE model, the final step is to devise a custom loss function and compile the model.

As previously mentioned, we will utilize KL divergence to gauge the loss between the latent space distribution and a reference standard Normal distribution. The “KL loss” complements the standard reconstruction loss (in this case, MSE) to ensure input and output images remain closely aligned.

Training the VAE Model

With the Variational Autoencoder model assembled, let’s train it for 25 epochs and visualize the loss chart.

Visualizing Latent Space and Generating New Digits

Given that our latent space is two-dimensional, we can visualize the neighborhoods of various digits on the latent 2D plane.

Plotting the digit distribution within the latent space allows us to visually associate different regions with distinct digits.

If we aim to generate a new image of the digit 3, we note that 3s are positioned in the upper middle of the latent space. Thus, we can select the coordinates of [0, 2.5] to generate an image based on those inputs.

As anticipated, we obtained an image closely resembling the digit 3 because we sampled a vector from a region in the latent space associated with 3's.

Now, let's generate 900 new digits across the entire latent space.

The exciting aspect of generating numerous images from the entire latent space is that it enables us to observe the gradual transitions between different shapes. This confirms the successful regularization of our latent space.

Final Thoughts

It's crucial to recognize that Variational Autoencoders can encode and generate significantly more complex data than MNIST digits.

I encourage you to elevate this straightforward tutorial by applying it to real-world datasets relevant to your field.

For your convenience, I have saved a Jupyter Notebook in my GitHub repository that includes all the code presented above.

If you wish to be notified the moment I release a new article on Machine Learning / Neural Networks (e.g., Generative Adversarial Networks (GAN)), please subscribe to receive email updates.

If you're not a Medium member and would like to continue reading articles from countless talented writers, you can join using my personalized link below:

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Title: Overcoming the Need for Approval: 3 Strategies to Find Freedom

Discover three strategies to liberate yourself from the compulsive need to be liked and cultivate genuine relationships.

A Candid Reflection on My Tumultuous Marriage Journey

A heartfelt look back at the regrets of staying in a troubled marriage and the lessons learned from it.

Landing Your First Summer Internship in the USA: A Guide

Discover essential tips for crafting a standout resume and securing a summer internship in the USA.

The Impact of Reality on Mental Health: Understanding Delusions and Hallucinations

An exploration of how reality influences mental health, focusing on delusions and hallucinations.

A Historic Spacewalk: Edward White's Pioneering Journey

Edward White’s first spacewalk was a groundbreaking achievement that showcased the intersection of mathematics and engineering in space exploration.

Embracing the Comfort Zone: Finding Balance in Life's Challenges

This piece explores the complexities of the comfort zone and how to expand it while honoring personal preferences.

Why Is the Minimum Wage Not Adjusted for Inflation?

An exploration of why minimum wage isn't indexed to inflation and its consequences.

The Remarkable Journey of Taylor Swift's Musical Evolution

Explore Taylor Swift's extraordinary music career, highlighting her evolution from country to pop and her impact on the industry.