In this tutorial, we will explore how to build and train deep autoencoders using Keras and Tensorflow.
The primary reason I decided to write this tutorial is that most of the tutorials out there, including the official Keras and TensorFlow ones, use the MNIST data for the training. I have been asked numerous times to show how to train autoencoders using our own images that may be large in number.
I will try to keep this tutorial brief and will not get into the details of how autoencoder works. Therefore, having a basic knowledge of autoencoders is the prerequisite to understand the code presented in this tutorial (needless to say that you must know how to program in Python, Keras and TensorFlow).
Autoencoders are unsupervised neural networks that learn to reconstruct its input. Denoising an image is one of the uses of autoencoders. Denoising is very useful for OCR. Autoencoders are also also used for image compression.
As shown in Figure 1, an autoencoder consists of:
- Encoder: The encoder takes an image as input and generates an output which is much smaller dimension compared to the original image. The output from the encoders is also called as the latent representation of the input image.
- Decoder: The decoder takes the output from the encoder (aka the latent representation of the input image) and reconstructs the input image.
Both encoders and decoders are convolutional neural networks with the difference that the encoders dimensions reduce with each layer and the decoders dimensions increase with each layer until the output layer where the dimensions match with the original image.
We will use our own images for training and testing the autoencoders. For the purpose of this tutorial, we will use a dataset that contains scanned images of restaurant receipts. The dataset is freely available from the link https://expressexpense.com/large-receipt-image-dataset-SRD.zip uner MIT License.
Although this dataset does not have a large number of images, we will write code that will work for both small and large datasets.
The code below is divided into 4 parts.
- Data preparation: Images will be read from a directory and fed as inputs to the encoder block.
- Neural network configuration: We will write a function that takes certain parameters and return the encoder, decoder and autoencoder convolutional neural networks
- Training the neural networks: The code that triggers the training, monitors the progress and saves the trained models.
- Prediction: The code block that uses the trained models and predicts the output.
I will use Google Colaboratory (https://colab.research.google.com/) to execute the code. You can use your favorite IDE to write and run the code. The code below works both for CPUs and GPUs, I will use the GPU based machine to speed up the training. Google Colab offers a free GPU based virtual machine for education and learning.
If you use a Jupyter notebook, the steps below will look very similar.
First we create a notebook project, AE Demo for example.
Before we start the actual code, let’s import all dependencies that we need for our project. Here is a list of imports that we will need.
# Import the necessary packages
import tensorflow as tf
from google.colab.patches import cv2_imshow
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam
import numpy as np
(Video) Autoencoders in Python with Tensorflow/Keras
Listing 1.1: Import the necessary packages.
Our receipt images are in a directory. We will use ImageDataGenerator class, provided by Keras API, and create training and test iterators as shown in the listing 1.2 below.
trainig_img_dir = “inputs”
height = 1000
width = 500
channel = 1
batch_size = 8
datagen = tf.keras.preprocessing.image.ImageDataGenerator(validation_split=0.2, rescale=1. / 255.)
train_it = datagen.flow_from_directory(
trainig_img_dir,
target_size=(height, width),
color_mode=’grayscale’,
class_mode=’input’,
batch_size=batch_size,
subset=’training’) # set as training data
val_it = datagen.flow_from_directory(
trainig_img_dir,
target_size=(height, width),
color_mode=’grayscale’,
class_mode=’input’,
batch_size=batch_size,
subset=’validation’) # set as validation data
Listing 1.2: Image input preparation. Load images in batches from a directory.
Important notes about Listing 1.2:
- training_img_dir = “inputs” is the parent directory that contains the receipt images. In other words, receipts are in a subdirectory under the “inputs” directory.
- color_mode=’grayscale’ is important if you want to convert your input images into grayscale.
All other parameters are self explanatory.
As shown in Listing 1.3 below, we have created an AutoencoderBuilder class that provides a function build_ae(). This function takes the following arguments:
- height of the input images,
- width of the input images,
- depth (or the number of channels) of the input images.
- filters as a tuple with the default as (32,64)
- latentDim which represents the dimension of the latent vector
class AutoencoderBuilder:
@staticmethod
def build_ae(height, width, depth, filters=(32, 64), latentDim=16):
#Initialize the input shape.
inputShape = (height, width, depth)
(Video) Building and Training an Autoencoder in Keras + TensorFlow + PythonchanDim = -1
# define the input to the encoder
inputs = Input(shape=inputShape)
x = inputs
# loop over the filters
for filter in filters:
# Build network with Convolutional with RELU and BatchNormalization
x = Conv2D(filter, (3, 3), strides=2, padding=”same”)(x)
x = LeakyReLU(alpha=0.2)(x)
x = BatchNormalization(axis=chanDim)(x)
# flatten the network and then construct the latent vector
volumeSize = K.int_shape(x)
x = Flatten()(x)
latent = Dense(latentDim)(x)
# build the encoder model
encoder = Model(inputs, latent, name=”encoder”)
# We will now build the the decoder model which takes the output from the encoder as its inputs
latentInputs = Input(shape=(latentDim,))
x = Dense(np.prod(volumeSize[1:]))(latentInputs)
x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)
# We will loop over the filters again but in the reverse order
for filter in filters[::-1]:
# In the decoder, we will apply a CONV_TRANSPOSE with RELU and BatchNormalization operation
x = Conv2DTranspose(filter, (3, 3), strides=2,
padding=”same”)(x)
x = LeakyReLU(alpha=0.2)(x)
x = BatchNormalization(axis=chanDim)(x)
# Now, we want to recover the original depth of the image. For this, we apply a single CONV_TRANSPOSE layer
x = Conv2DTranspose(depth, (3, 3), padding=”same”)(x)
outputs = Activation(“sigmoid”)(x)
# Now build the decoder model
(Video) Autoencoders in Keras and TensorFlow for Data Compression and Reconstruction - Neural Networksdecoder = Model(latentInputs, outputs, name=”decoder”)
# Finally, the autoencoder is the encoder + decoder
autoencoder = Model(inputs, decoder(encoder(inputs)),
name=”autoencoder”)
# return a tuple of the encoder, decoder, and autoencoder models
return (encoder, decoder, autoencoder)
Listing 1.3: Builder class to create autoencoder networks.
The following code Listing 1.4 starts the autoencoder training.
# initialize the number of epochs to train for and batch size
EPOCHS = 300
BATCHES = 8
MODEL_OUT_DIR = “ae_model_dir”
# construct our convolutional autoencoder
print(“[INFO] building autoencoder…”)
(encoder, decoder, autoencoder) = AutoencoderBuilder().build_ae(height,width,channel)
opt = Adam(lr=1e-3)
autoencoder.compile(loss=”mse”, optimizer=opt)
# train the convolutional autoencoder
history = autoencoder.fit(
train_it,
validation_data=val_it,
epochs=EPOCHS,
batch_size=BATCHES)
autoencoder.save(MODEL_OUT_DIR+”/ae_model.h5”)
Listing 1.4: Training autoencoder model.
The code listing 1.5 shows how to display a graph of loss/accuracy per epoch of both training and validation. Figure 2 shows a sample output of the code Listing 1.5
# set the matplotlib backend so figures can be saved in the background
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
# construct a plot that plots and displays the training history
(Video) Simple Autoencoder in TensorFlow 2.0 (Keras) | Deep Learning | Machine LearningN = np.arange(0, EPOCHS)
plt.style.use(“ggplot”)
plt.figure()
plt.plot(N, history.history[“loss”], label=”train_loss”)
plt.plot(N, history.history[“val_loss”], label=”val_loss”)
plt.title(“Training Loss and Accuracy”)
plt.xlabel(“Epoch #”)
plt.ylabel(“Loss/Accuracy”)
plt.legend(loc=”lower left”)
# plt.savefig(plot)
plt.show(block=True)
Listing 1.5: Display a plot of training loss and accuracy vs epochs
Figure 1.2: Plot of loss/accuracy vs epoch
Now that we have a trained autoencoder model, we will use it to make predictions. The code listing 1.6 shows how to load the model from the directory location where it was saved. We use predict() function and pass the validation image iterator that we created before. Ideally we should have a different image set for prediction and testing.
Here is the code to do the prediction and display.
from google.colab.patches import cv2_imshow
# use the convolutional autoencoder to make predictions on the
# validation images, then display those predicted image.
print(“[INFO] making predictions…”)
autoencoder_model = tf.keras.models.load_model(MODEL_OUT_DIR+”/encoder_decoder_model.h5")
decoded = autoencoder_model.predict(train_it)
decoded = autoencoder.predict(val_it)
examples = 10
# loop over a few samples to display the predicted images
for i in range(0, examples):
predicted = (decoded[i] * 255).astype(“uint8”)
cv2_imshow(predicted)
Listing 1.6: Code to predict and display the images
In the above code listing, I have used the cv2_imshow package which is very specific to Google Colab. If you are Jupyter or any other IDE, you may have to simply import the cv2 package. To display the image, use cv2.imshow() function.
In this tutorial, we built autoencoder models using our own images. We also explored how to save the model. We loaded the saved model and made the predictions. We finally displayed the predicted images.
FAQs
What are autoencoders in keras? ›
Autoencoders are a class of Unsupervised Networks that consist of two major networks: Encoders and Decoders. An Unsupervised Network is a network that learns patterns from data without any training labels. The network finds its patterns in the data without being told what the patterns should be.
What is autoencoder for anomaly detection using Tensorflow keras? ›Autoencoder is an unsupervised neural network model that uses reconstruction error to detect anomalies or outliers. The reconstruction error is the difference between the reconstructed data and the input data.
When should we not use autoencoders? ›5. When should we not use autoencoders? An autoencoder could misclassify input errors that are different from those in the training set or changes in underlying relationships that a human would notice. Another drawback is you may eliminate the vital information in the input data.
Which Optimizer is best for autoencoder? ›Fig 9 shows the training loss curves over 50 epochs. It can be seen that autoencoders with lookahead optimizer (with Adam) perform better compared to Adam (only) counterparts.
What is autoencoders in Tensorflow? ›An autoencoder is a special type of neural network that is trained to copy its input to its output. For example, given an image of a handwritten digit, an autoencoder first encodes the image into a lower dimensional latent representation, then decodes the latent representation back to an image.
What is the purpose of using autoencoder in deep learning? ›Autoencoders provide a useful way to greatly reduce the noise of input data, making the creation of deep learning models much more efficient. They can be used to detect anomalies, tackle unsupervised learning problems, and eliminate complexity within datasets.
What are the three 3 basic approaches to anomaly detection? ›There are three main classes of anomaly detection techniques: unsupervised, semi-supervised, and supervised. Essentially, the correct anomaly detection method depends on the available labels in the dataset.
How can autoencoder improve accuracy? ›By training the autoencoder on the training sample, it will try to minimize the reconstruction error and generate the encoder and decoder network weights. Later the decoder network can be cropped out and feature extraction embeddings can be generated using the encoder network.
Which algorithm is best for anomaly detection? ›Local outlier factor (LOF)
Local outlier factor is probably the most common technique for anomaly detection. This algorithm is based on the concept of the local density. It compares the local density of an object with that of its neighbouring data points.
Disadvantage: Autoencoders are not that efficient compared to Generative Adversarial Networks in reconstructing an image. As the complexity of the images increase, autoencoders struggle to keep up and images start to get blurry.
How many layers should an autoencoder have? ›
In its simplest form, the autoencoder is a three layers net, i.e. a neural net with one hidden layer. The input and output are the same, and we learn how to reconstruct the input, for example using the adam optimizer and the mean squared error loss function.
Can autoencoders Overfit? ›Abstract. Autoencoders (AE) aim to reproduce the output from the input. They may hence tend to overfit towards learning the identity-function between the input and output, i.e., they may predict each feature in the output from itself in the input.
What is bottleneck in autoencoder? ›Bottleneck: It is the lower dimensional hidden layer where the encoding is produced. The bottleneck layer has a lower number of nodes and the number of nodes in the bottleneck layer also gives the dimension of the encoding of the input.
Do autoencoders need alot of data? ›Autoencoders are an unsupervised technique that learns from its own data rather than labels created by humans. This often means that autoencoders need a considerable amount of clean data to generate useful results. They can deliver mixed results if the data set is not large enough, is not clean or is too noisy.
What is the accuracy of autoencoder in Python? ›autoencoder: 99% accuracy without resampling.
What is a deep autoencoder? ›Deep autoencoders: A deep autoencoder is composed of two symmetrical deep-belief networks having four to five shallow layers. One of the networks represents the encoding half of the net and the second network makes up the decoding half.
Can autoencoders be used for unsupervised learning? ›An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses y(i)=x(i) .
How many layers are in deep autoencoder? ›Deep learning methods for data classification
The major parts of the autoencoder network structure are the encoder function and the decoder function, which are used to reconstruct the data. The architecture of deep autoencoder consists of three layers: input, hidden, and output layer.
An autoencoder is a neural network architecture capable of discovering structure within data in order to develop a compressed representation of the input.
What are the main tasks that autoencoders are used for? ›An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding.
What is anomaly vs outlier detection? ›
Outliers are observations that are distant from the mean or location of a distribution. However, they don't necessarily represent abnormal behavior or behavior generated by a different process. On the other hand, anomalies are data patterns that are generated by different processes.
What is the best metric for anomaly detection? ›Beyond accuracy, the most commonly used metrics when evaluating anomaly detection solutions are F1, Precision and Recall. One can think about these metrics in the following way: Recall is used to answer the question: What proportion of true anomalies was identified?
How PCA can be used for anomaly detection? ›The PCA-Based Anomaly Detection component solves the problem by analyzing available features to determine what constitutes a "normal" class. The component then applies distance metrics to identify cases that represent anomalies. This approach lets you train a model by using existing imbalanced data.
How many epochs does it take to train an autoencoder? ›The Convolutional Autoencoder
It contributes heavily in determining the learning parameters and affects the prediction accuracy. You will train your network for 50 epochs. As discussed before, the autoencoder is divided into two parts: there's an encoder and a decoder.
A common belief in designing deep autoencoders (AEs), a type of unsu- pervised neural network, is that a bottleneck is required to prevent learning the identity function. Learning the identity function renders the AEs useless for anomaly detection.
Does autoencoder reduce dimensionality? ›In simple words, autoencoders are specific type of deep learning architecture used for learning representation of data, typically for the purpose of dimensionality reduction. This is achieved by designing deep learning architecture that aims that copying input layer at its output layer.
Which deep learning algorithm is best for prediction? ›- Convolutional Neural Networks (CNNs)
- Long Short Term Memory Networks (LSTMs)
- Recurrent Neural Networks (RNNs)
- Generative Adversarial Networks (GANs)
- Radial Basis Function Networks (RBFNs)
- Multilayer Perceptrons (MLPs)
- Self Organizing Maps (SOMs)
Artificial neural network (ANNs) is probably the most popular algorithm to implement unsupervised anomaly detection. ANNs can be trained on large unlabeled datasets and, given the layered, non-linear learning, can be trusted to find intricate patterns to classify anomalies of a great variety.
Which algorithm is most robust to outliers? ›Robust regression algorithms can be used for data with outliers in the input or target values.
What is autoencoder in CNN? ›In an Autoencoder both Encoder and Decoder are made up of a combination of NN (Neural Networks) layers, which helps to reduce the size of the input image by recreating it. In the case of CNN Autoencoder, these layers are CNN layers (Convolutional, Max Pool, Flattening, etc.)
What is autoencoder in Python? ›
The idea of auto encoders is to allow a neural network to figure out how to best encode and decode certain data. The uses for autoencoders are really anything that you can think of where encoding could be useful. Some examples are in the form of compressing the number of input features and noise reduction.
How autoencoders are different from CNN? ›Essentially, an autoencoder learns a clustering of the data. In contrast, the term CNN refers to a type of neural network which uses the convolution operator (often the 2D convolution when it is used for image processing tasks) to extract features from the data.
What is meant by an autoencoder? ›An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding.