^{So far we have always known
the correct classification for every member of the training set, this is
not always the case.}

^{Patterns are often created
by processes we are not able to describe well. In these cases we can try
to build a recogniser based on inherent properties of the patterns.}

^{Unsupervised Learning}

^{example:}

^{If we take speech signals and
transform them to the frequency domain we get a speech spectrogram.}

^{The spectrogram is a two dimensional
signal showing how the frequencies change over time.}

^{For human speech we find that
speech consists of two of three bands called formants. We could try to
classify speech by using the position of the formants. (this is not easy
because there is a wide variation in way the formants change for a particular
sound).}

^{Since there is some inherent
structure in the signal we can use unsupervised learning to reduce the
quantity of information in the formants.}

^{We need to construct a classifier
that minimises not the error between output and target, but the error in
the fit of a particular model to the data.}

^{If we use a very complex model
it will be able to fit the data quite well.}

^{An example model is a set of
points on the pattern space, each point represents the centre of a cluster.
An algorithm must be devised to move the points so that an error is reduced.}

^{The K means algorithm:}

^{Assume that there are K clusters
in the data, i.e. K classes.}

^{choose small random values for each cluster mean}- m1(0),m2(0),m3(0)....mK(0) where mi(0)=(x1,x2,....xn)
- Use the training set to assign a cluster to each training
pattern.

Given a training pattern y, y is in cluster i (Ci(t)) if - Forall j,d(y,mi(t))=min(d(y,mj(t))
- Calculate new values for m1(t)...mK(t)
- mi(t+1)=min(Sum(d(y,mi(t)))) where y is in Ci(t)
- Forall i, mi(t+1)-mi(t) < e

where d is some distance metric.

i.e. y is in cluster i, if i has the closest mean.

For the Euclidean distance this is just the mean of all the patterns assigned to class i.

Repeat from 2 until

i.e. the clusters are not moving much.

Clustering is applicable to problems where

- We have no labels for classes or we want to find inherent properties of data.
- We know the number of classes.
- We know very little about the process which has created the patterns.

ISODATA

ISODATA is an algorithm based on K means that does not need to know the number of clusters. It starts with one cluster and splits/merges clusters according to certain parameters.

- Minimum distance between two cluster centres below which they are merged.
- Maximum standard deviation for a cluster above which it is split into two clusters.
- Minimum proportion of training patterns in a cluster.

Amongst many others. To modify the K mans algorithm we need to add an extra step 4 which can split or merge clusters.

Clustering algorithms perform dimensionality reduction. They do this because they take a high dimensional pattern space and produce a lower dimensional space. Hopefully the lower dimensional space still contains most of the information from the original space.