This text contains just the questions and leaves away instructions etc., so make sure you read the actual exercise3.hmtl text, not just this one. ## Q1 L1 cache size > Increase the elements until the number of L1 cache misses rises > significantly. At what size does this happen? What is your estimate > for the L1 cache size? ## Q2 Cache line size > Switch to linear mode, and perform measurements with larger numbers of > elements. You will see a plateau in the number of L1 cache misses. > At this plateau: Given 100M accesses with a stride of 8, you get an L1 > cache miss every n Bytes. What is n? ## Q3 L2 cache size > Design an experiment for determining the L2 cache size and use it to > answer the following question: What is the L2 cache size? ## Q4 L2 cache latency > Design an experiment that has 100M L1 cache misses and ~0 TLB misses > and ~0 L2 cache misses ("~0" typically means <100_000 in this > exercise). What is the latency of an L2 cache access with linear > accesses (i.e., with the prefetcher helping)? What is the latency of > an L2 cache access in random mode (where the prefetcher cannot help)? ## Q5 Main memory latency > Use random mode, 1000000 elements and stride 64 to measure main memory > latency (plus TLB miss cost). How many cycles per access do you > measure on your first run? How many ns (user time) per access do you > measure on your first run? How many times is a L1 cache hit faster > than a main memory access in your first run (in user time)? ## Q6 Main memory accesses with prefetcher help > Switch to linear mode such that the prefetcher helps you and TLB > misses will be reduced. How many cycles per access do you measure on > your first run? How many ns per access do you measure on your first > run? What is the effect on the cache misses? What is the bandwidth > (in GB/s user time) of the main memory accesses, if you count, for > each access, the whole 64 bytes of a cache line that is actually > transferred from main memory? ## Q7 Conflict misses > Use stride=8192. Which is the last number of elements where the L1 > cache misses is ~0? Where do you see the difference for stride=4096? > Where for stride=2048? What is the associativity (number of ways) in > the L1 cache? How much capacity per way does the L1 cache have (L1 > size/number of ways)? What is the number of sets in the L1 cache > (capacity per way/cache line size)? ## Q8 L1 TLB entries > Use stride=131136. At which number of elements is the number of L1 > TLB misses no longer ~0? Repeat the same experiment with > stride=65600, stride=32832, stride=16448, stride=8256, stride=4160, > stride=2112. How many entries does the L1 TLB have? How much memory > does each entry cover (it's a power of 2, the strides above are chosen > to avoid conflict misses in the L1 cache)? ## Q9 L1 TLB miss penalty > Choose parameters such that the L1 TLB misses are close to 100M, but > the L1 cache misses and L2 TLB misses are ~0. How much does > each access cost? ## Q10 L2 TLB entries > Use stride=65600. Which is the last number of elements where the miss > rate is ~0? ## Q11 L2 TLB miss penalty > Choose the parameters such that the number of L2 cache misses stays > <1M, but the number of L2 TLB misses rises to >90M. What is the number > of cycles per memory access?