In this post we’ll study about deep learning architecture Resnet and we’ll apply this archicteture in simple lemon quality dataset and classify some imagens. Let’s rock.
Create by Kaiming, Resnet won the ILSVRC 2015 challenge using a Residual Network, that delivered an astounding top-five error under 3.6%. The winning variant used an extremely deep CNN composed of 152 layers (other variantes had 34, 50 and 101 layers). It confirmed the general trend: models are getting deeper and deeper, with fewer and fewer parameters. The key to being able to train such a deep network is to use skip connections, also cales shortcut connections: the signal feeding into a layer is also added to the output of a layer located a bit higher up the stack. Let’s see why this is useful.
The aim when training a neural network is to make it model the target function h(x). If you add the input x to the network's output, the network will be forced to model f(x) = h(x) - x rather than h(x). This is called residual learning.
When initializing a traditional neural network, its weights are set close to zero, resulting in outputs close to zero. By adding a skip connection, the network outputs a duplicate of its inputs, effectively modeling the identity function. This can greatly accelerate training, particularly when the target function is close to the identity function, which is often the case.
Additionally, adding multiple skip connections allows the network to make progress even if some layers have not begun learning yet. The skip connections facilitate the flow of signals throughout the network. A deep residual network can be thought of as a stack of residual units (RUs), each consisting of a small network with a skip connection.
The architecture of RenNet is simple. It begins and ends similarly to GoogleNet, but without a dropout layer, and in between is just a very deep stack of simple residual units. Each residual unit consists of two convolutional layers without a pooling layer, with batch normalization and ReLU activation, using 3x3 kernels while preserving spatial dimensions.
Note that the number of feature maps is doubled every few residual units, at the same time as their height and width are halved, using a convolutional layer with stride 2. When this happens, the inputs cannot be added directly to the outputs of the residual unit because they don’t have the same shape, for example, the problem affects the skip connection represented by the dashed arrow in the lat figure. To solve this problem, the inputs are passed through a 1x1 convolutional layer with stride 2 and the right number of output feature maps.
ResNet-34 is the ResNet with 34 layers (only counting the convolutional layers and the fully connected layer) containing 3 residual units that output 64 feature maps, 4 RUs with 128 maps, 6 RUs 256 maps, and 3 RUs with 512 maps. We’ll implement this architecture later.
ResNets deeper than that, such as ResNet-152, use slightly different resisual units. Instead of two 3x3 convolutional layers with, say 256 feature maps, they use three convolutional layers: first a 1x1 convolutional layer with just 64 feature maps (4 time less), wich acts as a bottleneck layer, then a 3x3 layer with 64 feature maps, and finally another 1x1 convolutional layer with 256 feature maps (4 time 64) that restores the original depth. ResNet-152 contains 3 such RUs that output 256 maps, then 8 RUs with 512 maps, a whopping 36 RUs with 1024 maps, and finally 3 RUs with 2048 maps.
One of the problems ResNets solve is the famous known vanishing gradient. This is because when the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This result on the weights never updating its values and therefore, no learning is being performed.
Our dataset is a kaggle dataset called Lemon dataset. This dataset has been prepared to investigate the possibilities to tackle the issue of fruit quality control. It contains 2.533 images (300 x 300 pixels). Lemon images are taken on a concrete surface. Dataset also includes empty images of this surface.
Dataset contains images of both bad and good quality lemons under slightly different lighting conditions (all under daylight) and sizes.

So, let’s go to our code. First thing to do is download correct libraries that we ‘ll use and download our dataset. Ah! Everything that we’ll do are make in google colab environment. If we go to the kaggle.com, you can download your json credentials, so do this and upload this file in your colab, because we’ll use this to download the lemon-quality-dataset. After, execute the code below.
import os# Lendo as crendenciais para download do datasetos.environ['KAGGLE_CONFIG_DIR'] = "/content"!chmod 600 /content/kaggle.json# Download do dataset!kaggle datasets download -d yusufemir/lemon-quality-dataset#Descompressao do dataset!unzip /content/lemon-quality-dataset.zip -d /content/kaggle/
The next step is import the libriares that we’ll use and read our dataset
import numpy as np import pandas as pd from PIL import Image import matplotlib.pyplot as pltimport cv2import tensorflow as tffrom tensorflow import kerasfrom keras_preprocessing.image import ImageDataGeneratorimport warnings warnings.filterwarnings("ignore")import osfor dirname, _, filenames in os.walk('/content/kaggle/lemon_dataset'): for filename in filenames: print(os.path.join(dirname, filename)) pass
Now we’ll create a folder. That folder is where we’ll put the lemon with good quality class and the bad quality class, and prepare our image dataset.
%mkdir ./training/%cp -R /content/kaggle/lemon_dataset/good_quality ./training/good_quality%cp -R /content/kaggle/lemon_dataset/bad_quality ./training/bad_qualitybasePath = "/content/training"images = {}for dirname, dirlist, filenames in os.walk(basePath): lamon_class = dirname.split('/')[-1] if dirname != basePath and lamon_class in ['good_quality', 'bad_quality']: print(f"Number of {lamon_class} images: {len(filenames)}") filePaths = [] for filename in filenames: filePaths.append(os.path.join(basePath, dirname, filename)) images[lamon_class] = filePaths
We can plot some images of our dataset.
fig, ax = plt.subplots(2, 3, figsize=(18, 8))for i in range(6): img1 = cv2.imread(images[list(images.keys())[i//3]][i%3]) img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB) ax[i//3][i%3].imshow(img1) ax[i//3][i%3].set_title(list(images.keys())[i//3]) ax[i//3][i%3].axis('off')plt.show()
Now we’ll use Keras API to create data augmentation images. That’s necessary because we’ll make a really deep neural network. Let’s check the python script.
TRAINING_DIR = "./training"training_datagen = ImageDataGenerator( rescale = 1./255, rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest', validation_split=0.2)
Notes that we’ll rotate the images, shift, flip and we’ll separete a validation part. The next figure show us some samples with this data augmentation.
Now we need separate our dataset in train and validation.
print('Traning Generator: ', end="")train_generator = training_datagen.flow_from_directory( TRAINING_DIR, target_size=(224, 224), class_mode='binary', batch_size=1, subset='training')print('Validation Generator: ', end="")validation_generator = training_datagen.flow_from_directory( TRAINING_DIR, target_size=(224, 224), batch_size=1, class_mode='binary', subset='validation')
Ok, next step is the part important this post. To create our ResNet Model. First we’ll make a class calls ResidualUnit. Most CNN architectures described so far are fairly straightfoward to implement (although generally you would load a pretrained network instead). To illustrate the process, let’s implement a ResNet-34 from scratch using Keras.
class ResidualUnit(keras.layers.Layer): def __init__(self, filters, strides=1, activation="relu", **kwargs): super().__init__(**kwargs) self.activation = keras.activations.get(activation) self.main_layers = [ keras.layers.Conv2D(filters, 3, strides=strides, padding="same", use_bias=False), keras.layers.BatchNormalization(), self.activation, keras.layers.Conv2D(filters, 3, strides=1, padding="same", use_bias=False), keras.layers.BatchNormalization() ] self.skip_layers = [] if strides > 1: self.skip_layers = [ keras.layers.Conv2D(filters, 1, strides=strides, padding="same", use_bias=False), keras.layers.BatchNormalization() ]def get_config(self): config = super().get_config().copy() config.update({ 'activation': self.activation, 'main_layers': self.main_layers, 'skip_layers': self.skip_layers, }) return configdef call(self, inputs): Z = inputs for layer in self.main_layers: Z = layer(Z) skip_Z = inputs for layer in self.skip_layers: skip_Z = layer(skip_Z) return self.activation(Z + skip_Z)
In the constructor, we create all the layers we will need: the main layers are the ones on the right side of the diagram, and the skip layers are the ones on the left (only needed if the stride is greater than 1). Then in the call() method, we make the inputs go through the main layers and the skip layers (if any), then we add both outputs and apply the activations function.
Next, we can build the ResNet-34 using a Sequential model, since it’s really just a long sequence of layers (we can trat each residual unit as a single layer now that we have the ResidualUnit class):
model = keras.models.Sequential()model.add(keras.layers.Conv2D(64, 7, strides=2, input_shape=[224, 224, 3]))model.add(keras.layers.BatchNormalization())model.add(keras.layers.Activation("relu"))model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding="same"))prev_filters = 64for filters in [64]*3 + [128]*4 + [256]*6 + [512]*3: strides = 1 if filters == prev_filters else 2 model.add(ResidualUnit(filters, strides=strides)) prev_filters = filtersmodel.add(keras.layers.GlobalAvgPool2D())model.add(keras.layers.Flatten())model.add(keras.layers.Dense(1, activation="sigmoid"))model.summary()
The only slightly tricky part in this code is the loop that adds the ResidualUnit layers to the model: as explained earlier, the first 3 RUs have 64 filters, the the next 4 RUs have 128 filters, and so on. We the set the stride to 1 when the number of filters is the same as in the previous RU, or else we set it to 2. Then we add the ResidualUnit, and finally we update prev_filters. We can check the our ResNet’s architecture .
Now we can compile and train our model.
model.compile( loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'] ) BATCH_SIZE = 8history_custom = model.fit( train_generator, batch_size=BATCH_SIZE, epochs=20, validation_data=validation_generator, verbose=1 )
Now, we need to check our loss and accuracy curve.
It’s ok! Our model prove itself well. But we need to do some inferece. To this I downloaded soma image lemon in the internet and make a prediction. We know that good quality is 1 and bad quality is 0.
!wget https://thumbs.dreamstime.com/b/lim%C3%A3o-podre-63225986.jpgfrom tensorflow.keras.preprocessing import imagetest_image = image.load_img('/content/limão-podre-63225986.jpg', target_size=(224, 224, 3))test_image = image.img_to_array(test_image)test_image = np.expand_dims(test_image, axis=0)prediction = model.predict(test_image)train_generator.class_indicesprediction
It is amazing that in fewer than 40 lines of code, we can build the model that won the ILSVRC 2015 challenge! This demonstrates both the elegance of the ResNet model and the expressiveness of the Keras API. Implementing th other CNN architectures is not much harder.
[1]https://visio.ai/deep-learning/resnet-residual-neural-network/
[2]Hans-On: Machine Learning with Scikit-Learn, Keras and Tensorflow — 2nd Edition — Aurélien Géron
[3]Deep Resiual Learning for Image Recognition — article(https://arxiv.org/pdf/1512.03385.pdf)
[4]
https://keras.io/
FAQs
How long to train ResNet from scratch? ›
is it possible to train the resnet from scratch? Yes, it is possible, but the amount of time one needs to get to good accuracy greatly depends on the data. For instance, training original ResNet-50 on a NVIDIA M40 GPU took 14 days (10^18 single precision ops).
How do you train a ResNet from scratch? ›- Step 1) Run the TensorFlow Docker container. ...
- Step 2) Download and preprocess the ImageNet dataset. ...
- Step 3) Download TensorFlow models. ...
- Step 4) Export PYTHONPATH. ...
- Step 5) Install Dependencies (You're almost ready!) ...
- Step 6) Set training parameters, train ResNet, sit back, relax.
- Step 1: Define the identity block. First, we define the identity block, which will make our neural network a residual network as it represents the skip connection:
- Step 2: Convolution block. ...
- Step 3: Build the model. ...
- Step 4: Training. ...
- Step 5: Print the model summary.
Although ResNet has proven powerful in many applications, one major drawback is that a deeper network usually requires weeks for training, making it practically infeasible in real-world applications.
How much data does it take to train a ResNet? ›You should have at least 2 classes, and the training dataset should contain enough examples of each class. Because we are doing from-scratch image classification, I recommend that you have at least 1000 images per category and an overall dataset size of at least 20,000 images.
How many epochs does it take to train ResNet? ›the ResNet model can be trained in 35 epoch. fully-conneted DenseNet model trained in 300 epochs.
How long does ResNet50 take to train? ›State-of-the-art ImageNet training speed with ResNet-50 is 74.9% top-1 test accuracy in 15 minutes. We got 74.9% top-1 test accuracy in 64 epochs, which only needs 14 minutes. Furthermore, when we increase the batch size to above 16K, our accuracy is much higher than Facebook's on corresponding batch sizes.
Does ResNet take a long time to train? ›Modern neural network models often take days or weeks to train. ResNets in particular are troublesome due to their massive size/depth. Note that in the paper they mention training the model on two GPUs which will be much faster than on the CPU.
How long does ResNet 50 ImageNet training take? ›Researchers from SONY today announced a new speed record for training ImageNet/ResNet 50 in only 224 seconds (three minutes and 44 seconds) with 75 percent accuracy using 2,100 NVIDIA Tesla V100 Tensor Core GPUs.
Does Keras have ResNet? ›The Keras ResNet got to an accuracy of 75% after training on 100 epochs with Adam optimizer and a learning rate of 0.0001.
Which optimizer is best for ResNet? ›
Momentum is very good for ResNet architecture for image classification problem. ResNet is very deep network and many researchers say that ADAM is the best, but my practical experience showed the Momentum is the best for training ResNet.
Is ResNet a deep learning? ›Residual Network (ResNet) is a deep learning model used for computer vision applications. It is a Convolutional Neural Network (CNN) architecture designed to support hundreds or thousands of convolutional layers.
Why ResNet is better than CNN? ›Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture that overcame the “vanishing gradient” problem, making it possible to construct networks with up to thousands of convolutional layers, which outperform shallower networks. A vanishing gradient occurs during backpropagation.
What is alternative for ResNet? ›ResNeSt — A replacement for Resnet in Computer Vision.
What are the cons of ResNets? ›The main disadvantages of ResNets are that for a deeper network, the detection of errors becomes difficult. Additionally, if the network is too shallow, the learning might be very inefficient.
Is 10 epochs enough? ›Observing the enormous discrepancy between epoch 99 and epoch 100 reveals that the model is already overfitting. As a general rule, the optimal number of epochs is between 1 and 10 and should be achieved when the accuracy in deep learning stops improving. 100 seems excessive already.
What is the best learning rate for ResNet50? ›ResNet-50 was able to achieve an accuracy of approx. 85% with a learning rate of 1e-4. It can be clearly observed that transfer learning applied on the CNN with RMSProp optimizer and learning rate of 1e-4 has achieved the highest accuracy with reduced time complexity.
How accurate is ResNet algorithm? ›It achieves better accuracy than VGGNet and GoogLeNet while being computationally more efficient than VGGNet. ResNet-152 achieves 95.51 top-5 accuracies. The architecture is similar to the VGGNet consisting mostly of 3X3 filters.
Is 50 epochs enough? ›Generally batch size of 32 or 25 is good, with epochs = 100 unless you have large dataset. in case of large dataset you can go with batch size of 10 with epochs b/w 50 to 100. Again the above mentioned figures have worked fine for me.
Is 3 epochs enough? ›The right number of epochs depends on the inherent perplexity (or complexity) of your dataset. A good rule of thumb is to start with a value that is 3 times the number of columns in your data. If you find that the model is still improving after all epochs complete, try again with a higher value.
How many epochs is enough for training? ›
How many epochs to train? 11 epochs are the ideal number to train most datasets. It may not seem right that we must repeatedly run the same machine learning or neural network method after running the full dataset through it.
Is ResNet50 deep learning? ›What Is the ResNet-50 Model? ResNet stands for Residual Network and is a specific type of convolutional neural network (CNN) introduced in the 2015 paper “Deep Residual Learning for Image Recognition” by He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian.
How to increase accuracy of ResNet50? ›- You could add another dense layer before the Dense layer: model.add(Dense(num_classesft,activation='softmax')) for example: model.add(Dense(250,activation='relu')) model.add(Dropout(0.5)) ...
- You could train ResNet from scratch. ...
- Use Heavier Data Augmentation.
- Experiment with different learning rates.
- In the first step we are importing the keras and tensorflow model by using the import keyword. ...
- After importing the module, now in this step we are loading the image in a PIL format. ...
- After loading the dataset now in this step, we are selecting an input.
The depth of ResNet for best accuracy is over four times deeper than previous deep networks. Achieved 3.57% top 5 error rate on the test set with 152 layer ResNet on ensemble model.
Is ResNet good for regression? ›If by a ResNet architecture you mean a neural network with skip connections then yes, it can be used for any structured regression problem.
Is ResNet better than Inception? ›While Inception focuses on computational cost, ResNet focuses on computational accuracy. Intuitively, deeper networks should not perform worse than the shallower networks, but in practice, the deeper networks performed worse than the shallower networks, caused not by overfitting, but by an optimization problem.
How many images was ResNet50 trained with? ›The ResNet-50 (residual neural network) is a variation of ResNet architecture with 50 deep layers that has been trained on at least one million images from the ImageNet database.
Why ResNet performs better? ›Layers in ResNets
Deep ResNets are built by stacking residual blocks on top of one another and go as long as hundred layers per network, efficiently learning all the parameters from early activations deeper in the network.
ResNet-50 Architecture
This used a stack of 3 layers instead of the earlier 2. Therefore, each of the 2-layer blocks in Resnet34 was replaced with a 3-layer bottleneck block, forming the Resnet 50 architecture. This has much higher accuracy than the 34-layer ResNet model.
Why is ResNet so popular? ›
Using ResNet has significantly enhanced the performance of neural networks with more layers and here is the plot of error% when comparing it with neural networks with plain layers. Clearly, the difference is huge in the networks with 34 layers where ResNet-34 has much lower error% as compared to plain-34.
How accurate is ResNet50? ›Previous Approaches to ResNet50
They boosted ResNet50 to a top-1 accuracy of 80.4% on ImageNet-1K. The original ResNet50 recipe reached 75.8% accuracy, so this improved.
Image Classification using a Pre-Trained ResNet Model
The TensorFlow official models are a collection of example models that use TensorFlow's high-level APIs. The official TensorFlow Resnet model contains an implementation of ResNet for the ImageNet.
Even though ResNet is much deeper than VGG16 and VGG19, the model size is actually substantially smaller due to the usage of global average pooling rather than fully-connected layers — this reduces the model size down to 102MB for ResNet50.
Is ResNet better than DenseNet? ›The DenseNet adopting dense concate- nation to all subsequent layers to avoid using direct summa- tion, preserves the features in preceding layers. DenseNet has been shown to have better feature use efficiency, outper- forming ResNet with fewer parameters [21].
Why EfficientNet is better than ResNet? ›EfficientNet is all about engineering and scale. It proves that if you carefully design your architecture you can achieve top results with reasonable parameters. The graph demonstrates the ImageNet Accuracy VS model parameters. It's incredible that EfficientNet-B1 is 7.6x smaller and 5.7x faster than ResNet-152.
Is ResNet fully convolutional? ›FCN-ResNet is constructed by a Fully-Convolutional Network model, using a ResNet-50 or a ResNet-101 backbone.
How many layers is ResNet? ›Description. ResNet-50 is a convolutional neural network that is 50 layers deep. You can load a pretrained version of the neural network trained on more than a million images from the ImageNet database [1].
Is ResNet a pretrained model? ›Here is what you learned about loading the ResNet pre-trained model using PyTorch and doing the predictions: PyTorch Torchvision package is used to import the models.
What is the difference between LSTM and ResNet? ›First, ResNet extracts latent features of daily and weekly load data. Then, LSTM is applied to train the encoded feature vector with dynamics, and make prediction suitable for volatile load data.
Does ResNet have fully connected layers? ›
The remaining three blocks of the network have 3 convolution layers and 1 max-pooling layer. Thirdly, three fully connected layers are added after block 5 of the network: the first two layers have 4096 neurons and the third one has 1000 neurons to do the classification task in ImageNet.
What is the advantage of ResNet over other models? ›Advantages of ResNet
Networks with large number (even thousands) of layers can be trained easily without increasing the training error percentage. ResNets help in tackling the vanishing gradient problem using identity mapping.
ResNet is originally trained on the ImageNet dataset and using transfer learning[7], it is possible to load pretrained convolutional weights and train a classifier on top of it.
How to solve Overfitting in ResNet? ›- Try to get more data.
- More data augmentation. For example, MixUp or CutMix usually works after many epochs. ...
- Add more regularization. -In fastai you could easily increase dropout, weight decay, etc in the head. ...
- Reduce the network size (this is the last option!).
YOLO-ResNet model is based on ResNet-18 (Residual Network 18), retains the first four downsamplings, and adds CBAM (Convolutional Block Attention Module) attention model to the input and output parts to obtain a 52\times 52-sized feature map for rebar detection by feature fusion process.
Why is ResNet good for image classification? ›The ResNet model is implemented by skipping connections on two to three layers and includes ReLU [24] and batch normalization in its architecture. Compared with other models, ResNet performs better in image classification and can extract image features well [25, 26], so was considered suitable for this research.
Why does ResNet skip connections? ›Skip Connections were introduced to solve different problems in different architectures. In the case of ResNets, skip connections solved the degradation problem that we addressed earlier whereas, in the case of DenseNets, it ensured feature reusability.
How long does it take to train ResNet 18? ›Currently, when using the code on this branch: https://github.com/benchopt/benchmark_resnet_classif/pull/53, I use 35 minutes per epoch to train a ResNet-18 on ImageNet in TensorFlow with a V100 GPU and a batch size of 128, with standard data augmentations.
How long does ResNet-50 take to train? ›For example, finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 1018 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2×1017 single precision operations per second [Dongarra et al.
How long should a neural network take to train? ›Training a neural network often takes a substantial amount of time. For example, training EfficientNet B0 takes about 550 GPU hours on NVIDIA Ti2080. Because we don't want to waste training time, it's common practice to examine the model's accuracy mid-training.
What is the best learning rate for resnet50? ›
ResNet-50 was able to achieve an accuracy of approx. 85% with a learning rate of 1e-4. It can be clearly observed that transfer learning applied on the CNN with RMSProp optimizer and learning rate of 1e-4 has achieved the highest accuracy with reduced time complexity.
What is better than ResNet? ›VGGNet not only has a higher number of parameters and FLOP as compared to ResNet-152 but also has a decreased accuracy. It takes more time to train a VGGNet with reduced accuracy. Training an AlexNet takes about the same time as training Inception.
Is ResNet deep learning or machine learning? ›Residual Network (ResNet) is a deep learning model used for computer vision applications. It is a Convolutional Neural Network (CNN) architecture designed to support hundreds or thousands of convolutional layers.
What are the disadvantages of ResNet 18? ›The main disadvantages of ResNets are that for a deeper network, the detection of errors becomes difficult. Additionally, if the network is too shallow, the learning might be very inefficient. ResNets resulted in deeper networks, while Inception resulted in wider networks.
How fast is ResNet50? ›processing time was 17 minutes 32 seconds, which is 2.5 times faster than in the 8x Tesla® K80 instances from the Google Cloud, and 2.49 times faster than in the 8x Tesla® K80 instances from Google AWS.
How long to train a ImageNet from scratch? ›Our single machine entry took around three hours, and Google's cluster entry took around half an hour. Before this project, training ImageNet on the public cloud generally took a few days to complete.
How long does it take to train ResNet 18 on ImageNet? ›FFCV: Fast Forward Computer Vision (and other ML workloads!) - GitHub - libffcv/ffcv: FFCV: Fast Forward Computer Vision (and other ML workloads!) PS A few examples: in 30 min, we can train ResNet-18 to 67% ImageNet acc on *one A100*. In 20 mins, ResNet-50 to 75.6% on a p4d AWS machine (<$5!).
Is it hard to build a neural network from scratch? ›Neural Networks are like the workhorses of Deep learning. With enough data and computational power, they can be used to solve most of the problems in deep learning. It is very easy to use a Python or R library to create a neural network and train it on any dataset and get a great accuracy.
Can you over train a neural network? ›Overtraining is a common problem when developing artificial neural networks; as a network adapts to trends in the dataset, it can find trends that are unique to the training set, and not the dataset as a whole. As a network becomes overtrained, its performance on the test set decreases.
How much data is enough to train neural networks? ›According to Yaser S. Abu-Mostafa(Professor of Electrical Engineering and Computer Science) to get a proper result you must have data for at-least 10 times the degree of freedom. example for a neural network which has 3 weights you should have 30 data points.