Assignment #4
(1) Check the possibility of extending MESI to work with two main memory modules
(that can be accessed in parallel by different cores) each hold a different
part of the address space:
Describe the extension's architecture and how it works.
How if at all can MESI be extended to work with these two memories (BusRd,BusRdX ....).
Give a formal argument as to the correctness of the extended MESI.
Give an example of code executed by several cores showing the advantage of the extended MESI
over the single memory MESI (or prove that MESI can not use two parallel memory modules).
Helpfull notes on SC
New tip: There are three necessury and suffcient conditions for sequential consistency ( book: Parallel Computer Architecture: A Hardware/Software Approach{Culler97}) which you can use here:
1- Cores issue memory operations in program order.
2- Each core waits for every store operation it has started to complete before issuing another operation.
3- A load operation $L$ by $core_i$ that returns a value from of a store $S$ performed by $core_j$ can not terminate before
$core_j$'s store terminated.
(2) Find an algorithm wherein threads frequently read+modify a set of shared variables such that
MESI is significantly slowing the execution.
Consider algorithms such as n-body simulations or extended Peterson (pls. get my approval for the algorithm
you have selected (email) ).
program this algorithm in OpenMP or ParC and measure its execution compared to a sequential version.
compare the execution times to a modified code where most of the accesses are made to
separate local variables that are not shared.
Use Vtune or Oprofile to see a clear increase in the amount of cache-misses between the two versions.