Assignment #4

(1) Check the possibility of extending MESI to work with two main memory modules (that can be accessed in parallel by different cores) each hold a different part of the address space:

  • Describe the extension's architecture and how it works.
  • How if at all can MESI be extended to work with these two memories (BusRd,BusRdX ....). Give a formal argument as to the correctness of the extended MESI.
  • Give an example of code executed by several cores showing the advantage of the extended MESI over the single memory MESI (or prove that MESI can not use two parallel memory modules).
  • Helpfull notes on SC
  • New tip: There are three necessury and suffcient conditions for sequential consistency ( book: Parallel Computer Architecture: A Hardware/Software Approach{Culler97}) which you can use here:

    1- Cores issue memory operations in program order.

    2- Each core waits for every store operation it has started to complete before issuing another operation.

    3- A load operation $L$ by $core_i$ that returns a value from of a store $S$ performed by $core_j$ can not terminate before $core_j$'s store terminated.

    (2) Find an algorithm wherein threads frequently read+modify a set of shared variables such that MESI is significantly slowing the execution. Consider algorithms such as n-body simulations or extended Peterson (pls. get my approval for the algorithm you have selected (email) ).

  • program this algorithm in OpenMP or ParC and measure its execution compared to a sequential version.
  • compare the execution times to a modified code where most of the accesses are made to separate local variables that are not shared.
  • Use Vtune or Oprofile to see a clear increase in the amount of cache-misses between the two versions.