Machine learning is a large field of study that overlaps with and inherits ideas from many related fields such as artificial intelligence.
The focus of the field is learning, that is, acquiring skills or knowledge from experience. Most commonly, this means synthesizing useful concepts from historical data.
As such, there are many different types of learning that you may encounter as a practitioner in the field of machine learning: from whole fields of study to specific techniques.
1. Supervised Learning
Supervised learning describes a class of problems that involves using a model to learn a mapping between input examples and the target variable.
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems.
Models are fit on training data comprised of inputs and outputs and used to make predictions on test sets where only the inputs are provided and the outputs from the model are compared to the withheld target variables and used to estimate the skill of the model.
Learning is a search through the space of possible hypotheses for one that will perform well, even on new examples beyond the training set. To measure the accuracy of a hypothesis we give it a test set of examples that are distinct from the training set.
There are two main types of supervised learning problems: they are classification that involves predicting a class label and regression that involves predicting a numerical value.
- Classification: Supervised learning problem that involves predicting a class label.
- Regression: Supervised learning problem that involves predicting a numerical label.
Both classification and regression problems may have one or more input variables and input variables may be any data type, such as numerical or categorical.
An example of a classification problem would be the MNIST handwritten digits dataset where the inputs are images of handwritten digits (pixel data) and the output is a class label for what digit the image represents (numbers 0 to 9).
An example of a regression problem would be the Boston house prices dataset where the inputs are variables that describe a neighborhood and the output is a house price in dollars.
Some machine learning algorithms are described as “supervised” machine learning algorithms as they are designed for supervised machine learning problems. Popular examples include decision trees, support vector machines, and many more.
Our goal is to find a useful approximation f(x) to the function f(x) that underlies the predictive relationship between the inputs and outputs
Algorithms are referred to as “supervised” because they learn by making predictions given examples of input data, and the models are supervised and corrected via an algorithm to better predict the expected target outputs in the training dataset.
The term supervised learning originates from the view of the target y being provided by an instructor or teacher who shows the machine learning system what to do.
Some algorithms may be specifically designed for classification (such as logistic regression) or regression (such as linear regression) and some may be used for both types of problems with minor modifications (such as artificial neural networks).
2. Unsupervised Learning
Unsupervised learning describes a class of problems that involves using a model to describe or extract relationships in data.
Compared to supervised learning, unsupervised learning operates upon only the input data without outputs or target variables. As such, unsupervised learning does not have a teacher correcting the model, as in the case of supervised learning.
In unsupervised learning, there is no instructor or teacher, and the algorithm must learn to make sense of the data without this guide.
There are many types of unsupervised learning, although there are two main problems that are often encountered by a practitioner: they are clustering that involves finding groups in the data, and density estimation that involves summarizing the distribution of data.
- Clustering: Unsupervised learning problem that involves finding groups in data.
- Density Estimation: Unsupervised learning problem that involves summarizing the distribution of data.
An example of a clustering algorithm is k-Means where k refers to the number of clusters to discover in the data. An example of a density estimation algorithm is Kernel Density Estimation that involves using small groups of closely related data samples to estimate the distribution for new points in the problem space.
The most common unsupervised learning task is clustering: detecting potentially useful clusters of input examples. For example, a taxi agent might
Clustering and density estimation may be performed to learn about the patterns in the data.
Additional unsupervised methods may also be used, such as visualization that involves graphing or plotting data in different ways and projection methods that involve reducing the dimensionality of the data.
- Visualization: Unsupervised learning problem that involves creating plots of data.
- Projection: Unsupervised learning problem that involves creating lower-dimensional representations of data.
An example of a visualization technique would be a scatter plot matrix that creates one scatter plot of each pair of variables in the dataset. An example of a projection method would be Principal Component Analysis that involves summarizing a dataset in terms of eigenvalues and eigenvectors, with linear dependencies removed.
The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to protect the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.