Cache memories

Large memories DRAM (dynamic ram) are slow ~100ns

compared to our pipeline time of ~10ns

Small memories SRAM (static ram) are fast ~10ns

but are ~10 times as expensive.

Memory is built using a hierarchy,

disk being the slowest and cheapest

then DRAM

then SRAM the fastest and most expensive.

A library is a good example of a memory hierarchy.

It is very important to have access to all books in a library, but at any one time you will only need a few books on one particular subject, and of those books maybe a few pages in each.

If you leave the books open, on your desk, at these particular pages, you have very fast access to the information.

If you are prepared to turn the pages you get slower access to other information in the book.

If you are prepared to go to the library you get even slower access to lots more information.

The same principle applies to memory.

The Principle of Locality states that programs access a relatively small portion of their address space at any instant of time.

Temporal locality - if an item is referenced, it will tend to be referenced again soon

Spatial locality - if an item is referenced, items whose addresses are close by will tend to be referenced soon.

A Hit is when the processor requests data that is in the faster memory (cache).

Hit Rate is the percentage of accesses that are found in the cache.

Hit Time is the time taken to access the cache.

A Miss is when the processor requests data that is not in the cache.

Miss Rate is (1 - Hit Rate).

Miss Penalty is the time taken to move data from memory to cache.

The Block Size is the amount of data that is transferred from memory to cache when a miss occurs.

Typical values are:

Block size 4 to 128 bytes

Hit time 1 cycle

Miss Penalty 8 cycles

Miss Rate 10%

Cache Size 256KB

When the processor wants to access data x5:

It first decides whether it is already in the cache.

If it isn’t it retrieves x5 from memory and inserts it in the cache.

How does the processor know if the data is in the cache?

How does the processor know its position in the cache?

The cache is ordered so that there is a relation between memory addresses and cache structure.

Direct mapping

The position or index of a data item in the cache is determined directly from its memory address.

Consider a cache of 8 entries each entry 32 bits wide.

Direct mapping will place data from memory into locations in the cache at positions corresponding to the low-order 3 bits of their memory address:

location data cache index

000000 23 000

000001 44 001

000010 13 010

000011 48 011

000100 17 100

000101 56 101

000110 44 110

000111 22 111

001000 39 000


010101 29 101


111111 33 111

the 3 low order bits give the position in the cache.

When the processor is first switched on, the cache will not contain valid data. So a valid bit is associated with each cache location, and is set when data is written.

Lots of memory locations are mapped to the same cache location.

We also need to know which location is currently in the cache.

We do this with a cache tags.

The cache tag is the remaining bits of the address:

cache cache cache corresponding

tag index data memory

000 000 23 000000

100 001 28 100001

011 010 34 011010

000 011 48 000011

000 100 17 000100

010 101 29 010101

100 110 92 100110

111 111 33 111111

To discover whether a data reference hits or misses the cache:

1 find the cache index of the address

2. match the cache tag with the cache tag part of the address


cache index is 110 -> cache tag is 100

cache tag part of 100110 is 100

100 == 100 -> hit


cache index is 011 -> cache tag is 000

cache tag part of 010011 is 010

000 != 010 -> miss

To increase the size of the cache we can either increase the number of positions, or increase the size of the data stored at those positions

Example of a cache with 32 entries each 32 bytes of data:

Increasing the number of bytes for each cache entry involves a penalty when there is a miss. More bytes need to be copied from memory into the cache. Spatial locality means that this is not necessarily a bad thing!

Using the cache index to find the correct data line before comparing the cache tag involves two levels of computation.

A fully associative cache combines the cache index with the cache tag.

To find out whether the data is in the cache:

compare the cache tag part of the memory address with all the cache tags in the cache.

This matching is done in parallel in a fully associative cache.

Cache replacement:

In a directly mapped cache:

If there is a cache miss then the cache is updated and the new entry replaces the old.

In a fully associative cache:

If there is a cache miss then the new entry can be placed anywhere.

Where do we put it?

Using the Least Recently Used algorithm - hardware keeps track of which cache entry has not been accessed for the longest time.

This can be tricky to implement!

A simple pseudo LRU algorithm is:

Keep an index.

If the cache at that value is accessed add one to the index.

If a replacement is needed choose the cache entry at the index.

Cache writes:

Reading cache is straight forward. Writing to cache is not.

How do we keep cache data and memory data the same?

Write Back policy:

Keep an extra bit in each cache entry to mark whether the entry has been written to.

When the cache entry is replaced, write it back to memory if it has changed.

Write Through policy:

Always write to memory as well as to cache.

Use a write buffer between memory to speed up memory access.

159.233 Hardware 20 - 1