Understanding Unsupervised Learning

Prajakta Sathe
3 min readMay 31, 2022

--

Photo by Bruce Warrington on Unsplash

As you might’ve understood already, this article is about unsupervised learning. So let’s get straight to the point. But before that, I would recommend you read a little bit about supervised learning as well here!

What is Unsupervised Learning?

According to Wikipedia — unsupervised learning is a type of algorithm that learns patterns from untagged data.

In simple words, as the name suggests, the machine learns without being explicitly taught about something. Unlike supervised learning, there is no teacher or historic data to guide the machine.

When training data is given as input to the machine, it tries to find underlying patterns in it and categorize the data accordingly on its own.

As supervised learning can be used for predictions or classifications, this is not the case with unsupervised learning. Unsupervised learning can only be used to identify the underlying structure of the dataset and create clusters from the data without humans interfering.

Let’s take an example -

When you were a child, you must have come across a lot of things in your surroundings — people, cars, dogs, etc. — you were probably able to cluster all the people around you into a ‘family’ cluster, all the cars around you into a ‘cars’ cluster, and so on. Why were you able to cluster these things? Because your brain was able to find similarities or patterns between all the people or cars. And so, you were able to cluster similar items into a single cluster. The same is the case with machines.

But why use unsupervised learning?

You might wonder, isn’t it just better to use supervised learning, where the model is properly trained and tested — rather than using unsupervised learning where the data isn’t even tagged.

You are right to a certain extent. But, think of situations where we don’t know what exactly we are searching for, or situations where labeled/classified data is not available. In such scenarios, unsupervised learning proves to be very useful.

Also, unsupervised learning means reduced human errors or bias, since no human intervention is needed.

Types of unsupervised learning —

Clustering — Grouping similar items together.

Examples of clustering algorithms —

  1. K-means clustering — This allows us to create a ‘k’ number of clusters, where each point in a cluster has similar properties.
  2. Hierarchical clustering — This is also used to form clusters. A hierarchy of clusters in the form of a tree (dendrogram) is developed. There are two types — Agglomerative and Divisive.
  3. Principal component analysis (PCA) — A dimensionality reduction method that transforms a large set of variables into a smaller one which still contains the gist of the dataset.
  4. Singular Value Decomposition (SVD) — A method to decompose a rectangular matrix into 3 matrices.

Association — Rule-based learning method for discovering interesting relations between variables in large datasets (according to Wikipedia).

Example of association algorithms —

  1. Apriori algorithm — Method to find the most frequent itemsets in a dataset and its relevant association rules.

Where is unsupervised learning used?

Recommendation Systems — Recommends products based on the customers’ likings, previous history of purchases, watch history, etc.

Customer segmentation — To understand different customer groups in order to develop marketing or business strategies.

Genetics — To analyze DNA patterns.

That’s all for this article! If you were able to learn something, please clap for the article or leave a comment!

Thanks for reading!

--

--

Prajakta Sathe

I write about data science and UI UX design and other exciting stuff!!