Every Data Scientist Must Know
Top 7 Machine Learning Methods. In this digital era, now most of the manual tasks are being automated. Now, machine learning algorithms are helping computers perform surgeries, play chess, and getting smarter and more personal.
We are living in a world of constant progress on the technological ground, and looking at how computing is getting advanced day after day. We can also predict what is to come in the days ahead.
The algorithm of machine learning, also called model is a mathematical expression that represents information or data in the context of any particular ???problem, which is often a business problem. The main aim is to go from data to insight. For instance, if an eCommerce retailer wants to anticipate sales for the next quarter for his/ her business, they might use a machine-learning algorithm to predict sales based on past sales and various other relevant & crucial data. In the same way, a windmill manufacturer may monitor important equipment visually and feed the video data through machine learning algorithms trained to identify dangerous cracks precisely. That?s why most businesses are using machine learning while getting mobile app development solutions.
In order to demystify the machine learning concept and to provide a learning path for those people who are new to these concepts, let?s look at 7 different & popular methods, including visualizations, simple descriptions, and examples for each one.
Top 7 Machine Learning Methods
The below mentioned 7 methods offer an overview ? and a foundation that you can build on as you hone machine learning skill and knowledge:
- Dimensionality Reduction;
- Transfer Learning;
- Neural Nets and Deep Learning; &
- Reinforcement Learning.
Do You Know?
Everyone is exposed to machine learning or AI every single day. Do you use the iPhone?s Siri? Amazon Echo? Google? Netflix? If you said no to Google, we know you are lying.
Supervised vs Unsupervised Machine Learning
Before jumping in, let?s distinguish between two machine learning categories: supervised and unsupervised.
Supervised Machine Learning
We apply supervised machine learning algorithm techniques when we have a data piece that we want to explain or predict. We do so by making use of old data of inputs and outputs in order to predict an output based on new input. For instance, you could use the supervised type of machine learning techniques to help a service business that desires to predict the number of its new users who will sign up for its service next month.
Unsupervised Machine Learning
By contrast, unsupervised machine learning looks at ways to group and relate data points without using a target variable to predict. In other words, it logically evaluates data in terms of traits and uses those traits to form item clusters that are similar to each other. There are many machine learning companies using this method.
For instance, you could use unsupervised machine learning techniques in order to help a retailer that wants to segment products with similar characteristics and without having an advance specification in which characteristics to use.
Do You Know?
Netflix saved approximately $1 billion in the year 2018 as a result of its machine learning algorithm which recommends customized movies and TV shows to subscribers.?
Let?s start with the top machine learning methods:
A class of supervised machine learning, the classification method explains or predicts a class value.
For instance, they can help predict whether an online customer will buy a product or not. The output can be yes or no: non-buyer or buyer. But classification methods are not limited to 2 classes. For instance, a classification method could help evaluate whether a given image contains a truck or a car.
In that case, the result will be 3 different values:
1) The image contains a car,
2) The image contains a truck,
3) The image contains neither a car nor a truck.
The simplest algorithm of classification is a logistic regression that gives the impression that it is a regression method, but it is not. Logistic regression precisely estimates the probability of event occurrence based on one or more inputs.
For example, logistic regression can take two exam grades for a student in order to estimate the likelihood that the student will be admitted to a specific college. Since the estimate is a probable number, the output is a number between 1 and 0, where 1 represents complete certainty. For the student, if the estimated probability is more than 0.5, then we predict that he?ll be admitted to the college. If the estimated probability is less than 0.5, then we predict that he will be rejected.
The table below traces the scores of previous students and indicates whether they have been admitted. Logistic regression allows us to draw a line that represents the decision limit.
Since logistic regression is one of the simplest classification models, it is a good starting point for classification. While progressing, you can immerse yourself in non-linear classifiers such as random forests, decision trees, support vector machines, and neural networks, to name a few.
With clustering methods, we enter the category of unsupervised machine learning because their purpose is to group observations that have similar characteristics.
Clustering methods do not use the output information for the training purpose but leave the algorithm in order to set the output. In the clustering procedure, we can only use visualizations to control the quality of final solution.
K-Means is the most popular classification method, where ?K? represents the number of clusters the user chooses to create. (Note that there are a plethora of techniques for choosing the K value, such as the elbow method.)
Basically, what K-Means does with data points:
Randomly chooses K centers in the data.
Recalculate the center of each cluster.
Assign each data point to the nearest randomly created centers.
If the centers do not change, the process is over. Otherwise, we return to step 2. (In order to avoid ending in an infinite loop if the centers continue to change, set a maximum number of iterations in advance.)
The below-mentioned graphic applies K-Means to a building dataset. Each column of this graph shows the efficiency of each building. The 4 measures concern air conditioning, connected equipment (refrigerators, microwaves, etc.), heating gas and domestic gas. We chose K = 2 for grouping, which makes it easy to interpret one cluster as a group of various efficient buildings and the other group as inefficient buildings. On the left, you can see the location of the buildings and on the right, you see two of the four dimensions we used as inputs: the heating gas and the connected equipment.
In exploring clustering, you will encounter some very useful algorithms, such as noise-based application density-based spatial clustering (DBSCAN), cluster hierarchical clustering, medium-shift clustering, expectation-maximization clustering with the use of Gaussian mixing patterns, among others.
Regression methods fall into the category of supervised machine learning. They help developers to explain or predict a particular numeric value which is based on a previous data set, for instance, predicting the price of a property based on last pricing data for similar properties.
The simplest way is linear regression, where we use some mathematical equations of the line (y = m *x + b) to model a dataset. We can form a linear regression model with the help of different pairs of data (x, y) by calculating the slope and position of a line that minimizes the total distance between all the data points and the line. In other words, here we calculate the intercept (b) and the slope (m) for a line that best approximates the observations in the data.
You can consider a more concrete example of the linear regression. I have already used linear regression to predict the energy consumption (in kWh) of some buildings by aggregating the age of the building, the square footage, the number of floors, and the number of clogged wall fixtures. Since there was more than one entry (square feet, age, etc.), I used multivariate linear regression. That principle was akin to one-to-one linear regression, but in that particular case, the created ?line? occurred in a multidimensional space which is based on the number of variables.
The below-mentioned graph shows to what extent the linear regression model corresponds to the energy consumption of the building. Now, just imagine that you have access to the characteristics of a building (square feet, age, etc.) but you do not know the energy consumption. In that particular case, we can use the adjusted line in order to estimate the energy consumption of the building in question.
Note that you can also use linear regression in order to estimate the weight of each factor that contributes to the final prediction of the consumed energy. For instance, once you have a formula, you can easily determine if age or size is the most important.
Regression techniques range from simple (like linear regression) to complex (like regularized linear regression, decision trees, polynomial regression, neural networks, random forest regressions, among others). But, do not get bogged down: start by studying simple linear regression, master that techniques and keep going from there.
4. Dimensionality Reduction
As the suggested name, we use dimensionality reduction in order to remove the least crucial information (sometimes redundant columns) from the dataset. In practice, it is often seen datasets containing hundreds or even thousands of columns (which is known as features). It is, therefore, crucial to reducing the total number of columns. For instance, images may include a plethora of pixels, which do not count for your analysis. Or, when you test the chips in the manufacturing process, you can easily apply thousands of tests and measurements to each chip, many of which offer redundant information. In these cases, you need dimensionality reduction algorithms in order to make the dataset manageable.
One of the most common methods of reducing dimensionality is Principal Component Analysis (PCA), which reduces the size of function space by finding various new vectors that maximize the linear variation of the data. PCA can significantly reduce the data size without losing too much information when the linear data correlations are strong. (And in fact, you can easily measure the true extent of information loss and adjust it accordingly.)
Another popular method is the t-stochastic incorporation of the neighbor (t-SNE) that allows a non-linear reduction in dimensionality. Users generally use t-SNE for data visualization, but you can use it for the tasks of machine learning tasks, like reducing the space between clustering and functions in order to name a few.
The following graph shows an analysis of the MNIST database of different handwritten digits. MNIST comprises thousands of digit images starting from 0 to 9, which researchers widely use to test their classification algorithms. Each line of this dataset is a vector version of the original image (size 28 x 28 = 784) and one label for each image (zero, one, two, .., nine). Note that we are reducing the dimensionality starting from 784 (pixels) to 2 (dimensions). Two-dimensional projection also allows us to visualize the large dataset.
5. Ensemble Methods
Imagine that you have decided to build a bike as you are not content with the options available online and in stores. You could start by finding the best of each part that you need. Once you have assembled all the good parts, the resulting bike will outperform all different options.
The overall methods use the idea of ??combining a plethora of predictive models (supervised machine learning) in order to obtain better quality predictions than each model could provide separately. For instance, random forest algorithms are an ensemble method that amalgamates many formed decision trees with various samples of data sets. As a result of this, the quality of the predictions of a random forest is usually greater than that of the forecasts estimated with a single decision tree.
You can think of overall methods as a way to reduce the bias and variance of a single machine learning model. This is crucial because a given model may be precise under certain conditions but inaccurate in others. Within another model, the relative preciseness can be reversed. By amalgamating the two models, the quality of the forecasts is balanced.
The majority of Kaggle contest winners use the comprehensive method. The most popular set algorithms are XGBoost, Random Forest, and LightGBM.
6. Neural Networks and Deep Learning
Unlike logistic and linear regressions that are considered linear models, the only goal of neural networks is to capture nonlinear models in data by adding parameter layers to model. In the image below, the simple neural network has majorly 4 inputs, an output layer and a single hidden layer with 5 parameters.
In fact, the neural network structure is flexible enough to build our well-known logistic and linear regression. The term deep learning comes from neurons network with many hidden layers (please go through the following figure) and encompasses a wide array of architectures.
It is particularly difficult to track the evolution of learning in-depth, in part because the research and industry communities have doubled their learning efforts, creating new methodologies every day.
For maximum performance, deep learning techniques need a large amount of data and great power of computing, as this method allows you to self-adjust various parameters within gigantic architectures. It is easy to comprehend why practitioners of deep learning need very powerful computers that are equipped with GPUs (graphic processing units).
In particular, deep learning methods have been extremely successful in areas of vision (image classification), audio, text, and video. The most common software packages for deep learning are PyTorch and Tensorflow.
7. Transfer Learning
Suppose you are a computer scientist working in the retail sector. You have spent months or even years building a high-quality model so as to categorize images as t-shirts, shirts, and polos. Your new task is to make a similar model for classifying dresses such as cargo, jeans, casual pants and dress pants. Can you transfer this knowledge built into the first model and effectively apply it to the second model? Yes, you can do this by using Transfer Learning.
Transfer learning consists of reusing a portion of a previously formed neural network and adapting this to a new but similar task. Specifically, once you have created a neural network using data for any particular task, you can transfer a fraction of formed layers and easily associate them with some new layers that you can train using the latest data. task. By adding a few layers, the neural network can adapt and learn quickly to the new task.
Transfer learning has the main benefit of requiring less data to form the neural network, which is particularly important because training in various deep learning algorithms is expensive in money and time(computing resources) ? and of course, it is very difficult to find enough labelled data for training purposes.
Let us go back to our example and assume that for the template of a shirt, you used a neural network with twenty hidden layers. After a few experiments, you realize that you can transfer 18 of the layers of shirt patterns and amalgamate them with a new layer of settings to train on the pants images. The model of pants would have 19 hidden layers. The outputs and inputs of the two tasks are quite different, but the reusable layers can summarize all the relevant information for both, for instance, aspects of the material.
Transfer learning has become widely popular and there are now a plethora of robust pre-trained models for common in-depth learning tasks like text classification and image.
全新 Digital Marketing 體驗，請聯絡 Web 仔。