# Lecture 8

## Unsupervised Learning

So far we have always known the correct classification for every member of the training set, this is not always the case.

Patterns are often created by processes we are not able to describe well. In these cases we can try to build a classifier based on inherent properties of the patterns.

This is Unsupervised Learning

example:

If we take speech signals and transform them to the frequency domain we get a speech spectrogram.

The spectrogram is a two dimensional signal showing how the frequencies change over time.

The spectrogram above shows the phrase, "In nineteen ninety two"
For human speech we find that speech consists of two of three main bands called formants. We could try to classify speech by using the position of the formants. (this is not easy because there is a wide variation in way the formants change for a particular sound).

Since there is some inherent structure in the signal we can use unsupervised learning to reduce the quantity of information in the formants.

We need to construct a classifier that minimises not the error between output and target, but the error in the fit of a particular model to the data.

If we use a very complex model it will be able to fit the data quite well.

An example model is a set of points in the pattern space, each point represents the centre of a cluster. An algorithm must be devised to move the points so that an error is reduced.

### The K means algorithm:

Assume that there are K clusters in the data, i.e. K classes.
1. choose small random values for each cluster mean
• m1(0),m2(0),m3(0)....mK(0) where mi(0)=(x1,x2,....xn)
2. Use the training set to assign a cluster to each training pattern.

3. Given a training pattern y, y is in cluster i at time t  - Ci(t) if
• Forall j,d(y,mi(t))=min(d(y,mj(t))
where d is some distance metric.
i.e. y is in cluster i, if i has the closest mean.
4. Calculate new values for m1(t)...mK(t)
• mi(t+1)=min(Sum(d(y,mi(t)))) where y is in Ci(t)
For the Euclidean distance this is just the mean of all the patterns assigned to class i.

Repeat from 2 until

• Forall i, mi(t+1)-mi(t) < e
i.e. the clusters are not moving much.
Clustering is applicable to problems where
• We have no labels for classes or we want to find inherent properties of data.
• We know the number of classes.
• We know very little about the process which has created the patterns.

### ISODATA

ISODATA is an algorithm based on K means that does not need to know the number of clusters. It starts with one cluster and splits/merges clusters according to certain parameters.
1. Minimum distance between two cluster centres below which they are merged.
2. Maximum standard deviation for a cluster above which it is split into two clusters.
3. Minimum proportion of training patterns in a cluster.
Amongst many others.

To modify the K mans algorithm we need to add an extra step 4 which can split or merge clusters.

Clustering algorithms perform dimensionality reduction. They do this because they take a high dimensional pattern space and produce a lower dimensional space. Hopefully the lower dimensional space still contains most of the information from the original space.