Machine Learning (ML) Introduction

What is Machine Learning (ML)?

While artificial intelligence (AI) is the broad science of mimicking human abilities, machine learning is a specific subset of AI that trains a machine on how to learn. So Machine learning (ML) is basically an ability to learn and improvise from previous experiences without being explicitly programmed instructions. It is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. 

The process of learning begins with observations or data collection, such as examples, direct experience, or instruction. The collected data is then studied and analyzed to search for patterns. This pattern recognition or data analysis is the base of machine learning which in the future helps in making better decisions.

The primary aim of machine learning is to allow computers to learn automatically without human intervention or assistance and adjust actions accordingly.

In the initial days of machine learning, it all began with pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. Researchers were interested to see if computers could learn and analyze the data. So the machine learning is a science that’s not new but one that has gained fresh momentum.

Different Types of machine learning:

The general workflow of machine learning is, Collect the data to train the algorithm so as to produce a model that is capable of predicting accurate results. This is a general workflow however, there are different types of machine learning methods, Broadly it is classified into the following four categories;

  • Supervised Learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement Learning
Introduction to machine learning

Supervised learning:

As the name suggests supervised learning is a type of machine learning where humans acting as a guide to teaching the algorithm what conclusions it should come up with. Consider like a teacher teaching to his students. In supervised learning, The training dataset is already labeled with correct answers.

In this learning method, The algorithm continuously predicts the result on the basis of training data and is continuously corrected by the teacher. The learning continues until the algorithm achieves an acceptable level of performance or accuracy.

Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim, etc.

Supervised learning is again classified into following categories;

  • Classification
  • Regression
  • Prediction
  • Gradient boosting

The most widely used supervised learning algorithms are:

Unsupervised learning:

In unsupervised learning, the algorithm is provided with the data that is not labeled, categorized or tagged with correct answers. So the algorithm has to discover the hidden patterns or similarities within the data and group it. Unlike supervised learning, no human assistance is given to the machine, So it has to learn by itself.

So unsupervised learning is nothing but discovering the hidden patterns or similarities from the dataset and grouping or labeling them without any human assistance.

For example, Consider the machine is provided with pictures of all types of fruits. Now the algorithm has to analyze the dataset and categorize it. So the basic categorization may be performed on the basis of fruit shape, color, size, etc. Unsupervised learning is classified into the following categories;

  • Clustering: Here we want to discover the inherent groupings in the dataset. Such as grouping customers by their purchasing behavior.
  • Association: Here we want to discover the rules that describe large portions of your data. Such as someone who purchases X item also tend to buy Y item as well.
  • Anomaly detection: Here it tries to find out unusual data points in your dataset. This is useful in pinpointing fraudulent transactions, discovering faulty pieces of hardware, or identifying an outlier caused by a human error during data entry.

Some of the popular unsupervised learning algorithms are:

Semisupervised machine learning:

In this machine learning approach, it uses both labeled and unlabeled data for training. Typically a small amount of labeled data in conjunction with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire).

This type of learning can be used with methods such as classification, regression, and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person’s face on a webcam.

Now as we know basic machine learning methods so we can differentiate between them like,

  • Supervised learning: Where a student is under the supervision of a teacher at both home and school to guide him.
  • Unsupervised learning: The student has to figure out a concept himself without the guidance of the teacher.
  • Semi-Supervised Learning: The teacher teaches a few concepts in class and gives questions as homework which is based on similar concepts.

Reinforcement learning:

In reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision-maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy. It is often used for robotics, gaming, and navigation.

Steps of Machine Learning:

Now the question that arises is, How do we start? What are the steps involved in machine learning?

Let us suppose you have been asked to design a system, that is capable of detecting if there is any animal present in the picture shown to it. So the animal detection system which you are going to design is called “Model” and this model will be created via a process called “Training”. During the training phase, our main aim will be to create a model that will give us the most accurate results.

Here one thing we should notice is that the model will only yield good results when it is trained with good data. So the machine learning starts with the data gathering and ends up with a fully trained model producing the most accurate results, at least for most of the times.

The purpose of machine learning is to create a system (Model) that answers a particular question via the process of training.

steps in machine learning

1. Gathering Data for machine learning:

Now once we have our problem statement with us, So it takes us to the first real step of machine learning that is “Data Gathering”. As said earlier, This step is very important as the quality of data gathered will directly determine how good the predictive model will turn out to be. The data collected is then tabulated and called the Training Data.

2. Data Preparation:

Cleaning and filtering of data: The next step of machine learning is Data preparation. In this step, Data is loaded into a suitable place and then prepared for training purposes. When we collect the data, There are chances that data might contain some errors, duplicate entries, missing values, etc. So the data preparation step involves data cleaning and filtering tasks. The other forms of adjusting and manipulation like normalization, error correction, and more take place at this step.

Visualizing Data: This is also a good enough time to do any visualizations of the data. Data visualization helps us to see if there are any relevant relationships between the different variables. how we can take their advantage and also it shows us if there are any data imbalances present.

Splitting Data: Now we have clean prepared data with us. So the next step is to split this data into two parts. The first part will be used for training the model (It will be the majority portion of the dataset) and the second part will be used for the evaluation of the trained model’s performance. The good split of the dataset will be, 80% of data for training purposes and 20% of data for evaluation purposes.

3. Choosing a model:

Since there are different algorithms available for different machine learning tasks. So for the model selection, we need to have a clear understanding of the problem statements and data we have.

During the model selection step along with the problem statements, we have to consider a few other factors like Accuracy of model, Interpretability, complexity, and scalability of the model. Also, time it takes to train, build and test the model.

4. Training:

The aim of this step is to create a model that is capable of predicting correct answers.

Now we have a model with us and also the data needed for training purposes. So during the training process, data is used to incrementally improve the model’s ability to predict. Each of this iteration is a part of the model’s training.

The training process involves initializing some random values say A and B of our model, predict the output with those values, then compare it with the model’s prediction and then adjust the values so that they match the predictions that were made previously.

This process then repeats and each cycle of updating is called one training step.

5. Evaluation:

At this step, We have completed the training part, So the next thing is checking the accuracy of the trained model.

This is the step where that dataset which we had kept aside for evaluation purposes comes into play. The evaluation step allows the testing of the model against data that has never been seen and used for training. This testing data is meant to be representative of how the model might perform when in the real world.

6. Parameter Tuning:

At the evaluation step, We may find some improvement points. So this further improvement in our training can be possible by tuning the parameters. This step is necessary to improve the accuracy of the model.

Since there are many considerations at this phase of training, it’s important that you define what makes a model good. The adjustment or tuning of these parameters depends on the dataset, model, and the training process. Once you are done with these parameters and are satisfied you can move on to the last step.

7. Prediction:

Machine learning is basically using data to answer questions. So this is the final step where you get to answer a few questions. This is the point where the value of machine learning is realized. Here you can Finally use your model to predict the outcome of what you want.

The above-mentioned steps take you from where you create a model to where you Predict its output and thus acts as a learning path.

Who is using machine learning?

From detecting skin cancer to sorting cucumbers, to detecting escalators in need of repairs, machine learning has granted computer systems entirely new abilities. It can be used in many fields which involves large scale data. Here are a few examples of fields which are using machine learning;

  • Financial Services and insurance:
    • As we know banks and other institutions involved in the finance and insurance domain generate a huge amount of data on a daily basis. So the as of today machine learning in this finance industry is used for two key purposes;
      • To identify important insights in data, and prevent fraud. The insights can identify investment opportunities, or help investors know when to trade.
      • Data mining can also identify clients with high-risk profiles, or use cyber surveillance to pinpoint warning signs of fraud.
  • Government Authorities:
    • Government authorities have multiple sources of data that can be analyzed (mined) to generate useful results. Agencies like public safety and utilities have a particular need for machine learning since they have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways to increase efficiency and save money. Machine learning can also help detect fraud and minimize identity theft.
  • Healthcare:
    • Machine learning is a fast-growing trend in the health care industry, thanks to the advent of wearable devices and sensors that can use data to assess a patient’s health in real-time. The technology can also help medical experts analyze data to identify trends or red flags that may lead to improved diagnoses and treatment.
    • Machine learning ML in healthcare helps to analyze thousands of different data points and suggest outcomes, provide timely risk scores, precise resource allocation, and has many other applications. This can include anything from cancers that are tough to catch during the initial stages, to other genetic diseases.
  • Ecommerce:
    • Websites recommending items you might like based on previous purchases are using machine learning to analyze your buying history. Retailers rely on machine learning to capture data, analyze it and use it to personalize a shopping experience, implement a marketing campaign, price optimization, merchandise supply planning, and customer insights.

Continue Reading:

Leave a Comment