This text contains just the questions and leaves away instructions
etc., so make sure you read the actual exercise3.hmtl text, not just this
one.

## Q1 L1 cache size

> Increase the elements until the number of L1 cache misses rises
> significantly.  At what size does this happen?  What is your estimate
> for the L1 cache size?

## Q2 Cache line size

> Switch to linear mode, and perform measurements with larger numbers of
> elements.  You will see a plateau in the number of L1 cache misses.
> At this plateau: Given 100M accesses with a stride of 8, you get an L1
> cache miss every n Bytes.  What is n?

## Q3 L2 cache size

> Design an experiment for determining the L2 cache size and use it to
> answer the following question: What is the L2 cache size?

## Q4 L2 cache latency

> Design an experiment that has 100M L1 cache misses and ~0 TLB misses
> and ~0 L2 cache misses ("~0" typically means <100_000 in this
> exercise).  What is the latency of an L2 cache access with linear
> accesses (i.e., with the prefetcher helping)?  What is the latency of
> an L2 cache access in random mode (where the prefetcher cannot help)?

## Q5 Main memory latency

> Use random mode, 1000000 elements and stride 64 to measure main memory
> latency (plus TLB miss cost).  How many cycles per access do you
> measure on your first run?  How many ns (user time) per access do you
> measure on your first run?  How many times is a L1 cache hit faster
> than a main memory access in your first run (in user time)?

## Q6 Main memory accesses with prefetcher help

> Switch to linear mode such that the prefetcher helps you and TLB
> misses will be reduced. How many cycles per access do you measure on
> your first run?  How many ns per access do you measure on your first
> run?  What is the effect on the cache misses?  What is the bandwidth
> (in GB/s user time) of the main memory accesses, if you count, for
> each access, the whole 64 bytes of a cache line that is actually
> transferred from main memory?

## Q7 Conflict misses

> Use stride=8192.  Which is the last number of elements where the L1
> cache misses is ~0?  Where do you see the difference for stride=4096?
> Where for stride=2048?  What is the associativity (number of ways) in
> the L1 cache?  How much capacity per way does the L1 cache have (L1
> size/number of ways)?  What is the number of sets in the L1 cache
> (capacity per way/cache line size)?

## Q8 L1 TLB entries

> Use stride=131136.  At which number of elements is the number of L1
> TLB misses no longer ~0?  Repeat the same experiment with
> stride=65600, stride=32832, stride=16448, stride=8256, stride=4160,
> stride=2112.  How many entries does the L1 TLB have?  How much memory
> does each entry cover (it's a power of 2, the strides above are chosen
> to avoid conflict misses in the L1 cache)?

## Q9 L1 TLB miss penalty

> Choose parameters such that the L1 TLB misses are close to 100M, but
> the L1 cache misses and L2 TLB misses are ~0.  How much does
> each access cost?

## Q10 L2 TLB entries

> Use stride=65600.  Which is the last number of elements where the miss
> rate is ~0?

## Q11 L2 TLB miss penalty

> Choose the parameters such that the number of L2 cache misses stays
> <1M, but the number of L2 TLB misses rises to >90M.  What is the number
> of cycles per memory access?