Artificial Intelligence Editor's Picks MainSlide

Levels of Supervision in Deep Learning

Dr. M. Siyamalan

Deep Learning (DL) is a Machine learning technique, both are under the umbrella of Artificial Intelligence. DL is a hot research topic today as it gives state-of-the-art results for various applications in different domains, including Computer Vision, Medical Imaging, and Natural Language Processing. DL is a deep version of Artificial Neural Networks, which are inspired by the way human brains work. Before applying a DL model on a particular task, the DL model must be trained using the available data. Different levels of supervision can be used when training a DL model, which ranges from no supervision to full supervision. The type of supervision depends on the task and the types of annotations we have. This article summarizes different levels of supervision and give examples of applications where these supervisions are used.

No supervision

These approaches are known as unsupervised approaches. Here, a set of samples (or data) are given to the DL algorithm without their tags (or labels), which specify what the sample contains. For example, in the case of image analysis, if the given image is tagged as “apple”, that means that the image contains one or more apples among other contents. Unsupervised learning approaches can be used for a variety of purposes, including, clustering, association rule mining, data generation, dimensionality reduction, and anomaly detection.

Figure 1: An example for clustering: Left: unclustered items, Right: Items that are grouped/clustered.

In clustering, the task is to group the given samples based on their similarity. Whenever a new sample is given, its category (or group) can be identified based on the trained model. However, the number of groups (or categories) must be specified as the input to these algorithms. In a real-world scenario, let’s assume that different types of potatoes are mixed in a bag and given to you, and you were asked to categorize themintothree categories. You may categorize them based on their color as shown in Figure 1.

Association rule mining is to find the associations among different items in the given data. Assume that, you are the owner of a supermarket, and you want to know what are the sets of items which are frequently bought together. If you know this detail, you could improve your business in different ways, for example, if bread and butter are bought together, you can place these items in the same place so that the customers do not need to search for them, or even you may reduce the price of butter, but increase the price of bread to increase the overall sales. Figure 2 shows an example online book selling platform which recommends books to improve the sales.

Figure 2: An example book recommendation system.

Different unsupervised approaches were proposed for the generation of new data. Auto-Encoders (Baldi, 2012)and Generative Adversarial Networks (Goodfellow, 2014) are two popular approaches among them. Auto-Encoders try to embed the original data into a compressed latent feature space, and then reconstruct the original data from this compressed representation. New data can be generated by manipulating this latent space. This compressed representation can be used as a dimensionality-reduced version of the original data. Principal Component Analysis (PCA) is another popular approach for dimensionality reduction, which tries to maximize the variance of the data in the projected space. There are many other applications of unsupervised learning, which include anomaly or outlier detection, image reconstruction, etc.

Full Supervision

In full supervision (Figure 3 and 4), each training data sample is associated with a label (or class) provided by the human annotator, and the task is to predict the label of the unknown sample. For example, let’s assume that we are interested in identifying whether a given image containsa car or not. In this scenario, our labeled training dataset should contain two categories of images; Each image in the first category contains one or more cars, and the images in the second category do not contain any car. Image classification, image segmentation, sentiment analysis from text documents, spam email detection, weather forecasting, stock market trend prediction, etc. are some of the applications of supervised algorithms.

Figure 3: An illustration of fully supervised learningto classify whether a given image contains car or not. Here each image in the training set is associated with an image-level label (car or no-car).

Figure 4: An illustration of fully supervised learning for the detection of cars. Here each image in the training set is associated with an image-level label (car or no-car), and region-level labels (a bounding boxes around each car)are available for the images which contain car.

Supervised approaches can be mainly categorized into classification and regression. In classification, the labels are discrete, e.g., “it will rain in the next hour”, “it won’t rain in the next hour”. But in regression, the labels are continuous, e.g., how much rain we can expect in the next hour, the value of the temperature. Deep neural networks can be used for any of these tasks by the selection of an appropriate loss function to train the DL model.

Segmentation also can be considered as an example for classification. In image segmentation, we are interested in segmenting the regions (or pixels) corresponding to a target (e.g., car) than classifying images. In contrast to image classification, where each image is labeled into one of the predefined classes, in image segmentation, each pixel must be labeled and used as the training data to train the DL models.

Classification tasks can be further categorized into “multi-class classification”, and “multi-label classification”. In multi-class classification each sample is assigned to one and only one label; for example, an image may contain either an apple or an orange, but not both at the same time. However, sometimes, in the same image, we may have both apples and oranges. These types of tasks are called “multi-label” classification. Again, DL algorithms can support both of these tasks by the careful selection of the loss function for training.

Semi-Supervision

Unsupervised data alone is not enough to perform a classification task. Supervised approaches are the best choice for classification and segmentation. However, supervised approaches require labeled samples for training, where each training sample must be annotated by a human expert, which is time-consuming and costly. To overcome this, semi-supervised approaches were explored; which try to learn the DL model from both the labeled and unlabeled data (Figure 5).

Self-training is one of the most popular semi-supervised approaches (Xie, 2020), where first a DL model is trained on the available labeled data, and the trained model is then used to identify the labels of the unlabeled data. The high-confident samples (the samples with high predicted probabilities) and their predicted labels are then used as if they were true labels and added to the original supervised data to retrain the DL model.

Figure 5: An illustration of semi-supervised learning for the classification of images into car vs. non-car. The training set composed of labeled and unlabeled data.

Active Learning

Active learning aims to reduce the annotation efforts needed to annotate all the training data. Similar to semi-supervised learning, in active learning, the DL model is trained using both the labeled and the unlabeled data (Figure 5). However, compared to other forms of supervision described in this article, in active learning, an annotator is involved in the training process, where the DL model interactively asks the annotator to label some of the identified unlabeled samples. The newly labeled samples are then added with the original labeled data to improve the performance of the DL algorithm. This active labeling process can continue until enough labeled samples are obtained. Compared to annotating all the samples, active learning significantly reduces the annotation efforts by only labeling the important samples which are needed for model training.

Figure 5: An illustration of active learning for the classification of images into car vs. non-car.

Weak Supervision

We have already seen that for fully supervised image segmentation, we require pixel-label labels. However, let’s assume a situation, where we need to identify the regions corresponding to a particular target (e.g., car) in the given images. But, to do so, we have only the image-level labels, and no pixel-level or region-based labelsare available, i.e., for the training images with car, the location of the car regions are unknown. In this scenario, we aim to identify the regions corresponding to cars from image-level ones (Figure 6). This type of problem is referred to as weakly-supervised learning or multiple-instance learning (Cheplygina, 2019). Various approaches were explored in the machine learning community for this purpose. In DL, Class Activation Map (CAM) (Zhou, 2016) is one of the popular approaches to obtain pixel-level labels from image-level ones. In CAM, first, the DL model is trained using image-level labels and then pixel-level predictions are obtained by finding the corresponding activations.

Figure 6: An illustration of weakly supervised learning for the detection of cars. Here each image in the training set is associated with only image-level labels (car or no-car), and no region-level labels (bounding boxes around cars) are available for the training set.

Reinforcement Learning

Reinforcement learning has a wide range of applications, including, game playing, robot navigation, autonomous driving, process planning, etc.In supervised machine learning, each sample is associated with a label as each sample is independent of the other. Assume that we are going to train a model to play a game. In this scenario, we don’t have labels for each step of the game, instead, a label (win or loss) is associated with the entire game. In reinforcement learning, the learning system is called the “agent”, and it learns by interacting with the environment to achieve a goal. The agent is allowed to perform a set of actionsand receives rewards or punishments (win or loss in the case of a game) based on the selected set of actions. The learning aims to select the best possible actions to maximize the reward.

Figure 7: The basic idea of Reinforcement learning (image from (Chinnamgari, 2019)).

References

Baldi P., Autoencoders, Unsupervised Learning, and Deep Architectures. 2012 Proceedings of ICML Workshop on Unsupervised and Transfer Learning, in Proceedings of Machine Learning Research, 27:37-49.

Cheplygina V, Bruijne M, Pluim JPW. Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. 2019 Medical Image Analysis, 54:280-296

Chinnamgari S.K., 2019 R Machine Learning Projects, Packt publishing Ltd.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems – Volume 2, 2672–2680.

Xie Q., HovyE. H., Luong M., LeQ. V., Self-training with Noisy Student improves ImageNet classification, 2020, IEEE/CVF Conference on Computer Vision and Pattern Recognition

Zhou B., Khosla A., Lapedriza A., Oliva A., and Torralba A. 2016, Learning Deep Features for Discriminative Localization. IEEE/CVF Conference on Computer Vision and Pattern Recognition

Author

Dr. Siyamalan Manivannan

Senior Lecturer, Department of Computer Science, University of Jaffna.

View all posts

Levels of Supervision in Deep Learning

Dr. M. Siyamalan

Author

Call for EOI deadline Extended till 31st July 2024

DeepFakes– Fakes Become Real

Levels of Supervision in Deep Learning

Overfitting in Deep Learning and Ways to Reduce It

Tracking The Sun for More Energy

Winners of Vidya E-News Article Contest 2021