What is unsupervised learning?
“The primary aim of machine learning is to allow computers to learn automatically without human intervention or assistance and adjust actions accordingly.” Our today’s topic “unsupervised learning” is exactly aligned with this sentence.
Unsupervised learning is one of the machine learning techniques where the algorithm is provided with the data that is not labeled, categorized or tagged with the correct answers. The algorithm has to discover the hidden patterns or similarities within the data and group it.
Unlike supervised learning, no human assistance is given to the machine, So it has to learn by itself.
For example, Consider the machine is provided with the pictures of all different types of fruits but without their names (Unlabeled dataset). Now during the unsupervised learning process, the algorithm has to analyze all the fruit images (dataset) and has to find some common characteristics within them. Common characteristics within fruit images can be fruit shape, color, size, color patterns, etc. So later the images can be grouped or labeled based upon these common characteristics.
In machine learning grouping of the unlabeled dataset is known as clustering.
Supervised machine learning:
So Machine learning is a specific subset of AI (Artificial Intelligence) that trains a machine on how to learn. It is an ability to learn and improvise from previous experiences without being explicitly programmed instructions.
As the name suggests, supervised learning is a type of machine learning that happens under continuous human supervision and assistance. Here humans acting as a guide to teaching the algorithm about what conclusions it should come up with. It is more like a teacher teaching his students. Here the training dataset is already labeled with correct answers.
In the supervised learning method, The algorithm predicts the result based on the training data and it is continuously corrected by the teacher. The learning process continues until the algorithm achieves an acceptable level of performance or accuracy.
Difference between supervised and unsupervised learning
The basic difference between supervised and unsupervised learning is that Supervised learning is used to predict the result for new input, Whereas unsupervised learning is used to find hidden patterns within an existing dataset.
|Characteristic||Supervised Learning||Unsupervised Learning|
|Learning Goal||Supervised learning is used to predict the result for a new input.||Unsupervised learning is used to discover hidden pattern in dataset.|
|Dataset Used||Algorithms are trained using labelled dataset.||Algorithms are trained using unlabeled dataset.|
|Human assistance||Complete learning process happens under human supervision and assistance.||All the learning process happens without human supervision.|
|Basic Types||It is classified into two types i.e Classification and Regression.||It can be classified into two basic types i.e Clustering and Association.|
|Output||It predicts the result.||It finds the hidden relationships and patterns.|
|Accuracy||It produces more accurate results.||When compared with supervised learning results less accurate.|
Types of unsupervised machine learning
Unsupervised learning is classified into two main categories;
Grouping of the unlabeled datasets into small groups is called clustering. Since in unsupervised learning there is no external label attached to the dataset items, so the algorithm has to discover natural grouping in the dataset.
Clustering splits the dataset into small groups (clusters) based upon common characteristics.
For example, While creating a song playlist. You can create a playlist based on any common attribute like Singer, Genre, Recording year, Language, etc. So in this example, each playlist you have created is nothing but a cluster.
However, In the context of machine learning, the algorithm has to learn the features and patterns all by itself without any given input-output mapping. The algorithm can extract inferences from the nature of data objects and then create distinct classes to group them appropriately.
- Market segmentation: Grouping of customers based upon their purchasing behavior. This type of grouping or clustering of customer data helps in running an advertisement campaign, Product suggestions, etc.
- Library Planning: Creating clusters of books based on the topics, writers, etc.
Types of clustering algorithms
- K-means clustering: It splits the dataset into a number (K) of mutually exclusive clusters.
- hierarchical clustering: Clusters dataset entities into parent and child clusters. For example splitting of customer information according to age, gender, etc..
- DBSCAN: For density-based clustering
This unsupervised learning technique is used for finding out hidden relationships between large datasets.
It identifies the associations (relationships or dependencies) between dataset items. For example, a person who has recently purchased a new home is more likely to buy furniture as well. This relationship information is very useful for decision making.
- Market basket data analysis: In this method, the buying pattern of customers is analyzed. After analysis, it can come up with associations like items that are frequently brought together. This association finding is helpful in many places like;
- Product recommendation: Recommending a product to a specific group of customers who are more likely to buy it.
- Advertisement campaign design: Showing a specific advertisement to only a specific group of customers. By clustering the data we can easily identify the target audience to run the advertisement campaign.
- Inventory Management: Considering which specific brands are frequently brought together we can plan the inventory.
- Catalog Design: While designing a product catalog or arranging items in a shop we can group the items which are frequently brought together.
Types of Association algorithms
- Apriori algorithm
- FP-growth algorithm
Applications of unsupervised learning
Unsupervised learning can be used to perform variety of tasks such as;
It is a type of supervised learning that is used to find out unusual data points in a dataset. It finds out rare items, events or observation which differs with the majority of the dataset.
The word anomaly is a synonym of exception, irregularity or deviation. So unsupervised anomaly detection tries to find out such odd data points from an unlabeled dataset detection is useful in problems like;
- Finding fraudulent bank transactions.
- Discovering faulty pieces of hardware.
- Identifying the error introduced during the data entry phase.
Clustering automatically split the dataset into groups based on their similarities. This grouping of the dataset is useful in many domains like;
- E-commerce: To identify potential buyers, Product recommendations, etc.
- Healthcare: Clustering the data of existing patients suffering from the same disease will help to diagnose the new patient easily.
- Banking and insurance: Potential policy buyer.
It identifies sets of items or events that frequently occur together in your dataset. As discussed earlier, retailers often use it for basket analysis. Because it allows analysts to discover goods often purchased at the same time and develop more effective marketing strategies.
- Unsupervised learning uses an unlabeled dataset for learning.
- The algorithm has to find out the hidden patterns or similarities from the unlabelled dataset.
- After finding similarities it labels the dataset and split the large dataset into small clusters.
- All the learning process happens without humans assistance.
- Unsupervised learning is classified into two basic types;
- Classification: To classify the dataset items into small groups or clusters.
- Association: To find out the hidden relationships between dataset items and group them based on that relationship.