# Lecture 6

## Neural Networks

Consider the plane shown above; this ane be used to make decisions about patterns in the x-y plane. For every point in the plane x-y there is a corresponding value for z where (x,y,z) lies on the angled plane. If z is negative then the point lies above the dotted line, otherwise it is below the dotted line.

The equation for z is

z=wx+vy+c

We can generalise this for more dimensions

o=w1*i1+w2*i2+w3*i3....

o=sum[j=1..n](wj * ij)

o is the output, w are the weights and i is the input.

This can be drawn as shown below.

Generally we don't care about the size of the output, it will be a yes/no answer for a class; so use the following:

here:

y=f(sum(wixi+theta))

The function in the box is called the sigmoid function and is:

f(x) = 1/(1+exp(-ax))

If t is the expected output then there will be an error between t and o. To train the system we need to minimise this error. For a number of these functions there will be one error - the sum of the individual errors.

E = (sumk(tk-yk)2)/2

E = ((tk-f(sum(wikxik+theta)))2)/2

To train the classifier, just change each w so that e becomes smaller.

Now

dE/dwi=-sumk(tk-yk)*dyk/dwi

dE/dwi=-sumk(tk-yk)*f '(sum(wikxik)+theta)xik

dE/dwi=-sumk(dkxik)

Where

dk = (tk-yk)*f '(sum(wikxik)+theta)

f(x) = 1/(1+exp(-ax))

therefore

f '(x) = f(x)(1-f(x))

so

dk = (tk-yk)*yk(1-yk)

Weight updating rule:

dwi = n sumk(dkxik)

where n is a parameter used to change the speed of gradient descent. Note that we want to reduce the error so the minus sign disappears.

For theta:

dE/dtheta=-sumk(dk)

therefore

dtheta = n sumk(dk)

This device is sometimes called a perceptron. The training rule is called the delta rule.

### Perceptron Learning Algorithm

```repeat {
set dw1=dw2=dw3=dtheta=0;
for every pattern xk do {
calculate yk
calculate dk
Add n dk xik to dwi for i=1..n