butterfly network slides
Shared memory Project: a Verilog module + test bench that implements multiport memory module that can support
concurrent read/write operations to a set of memory modules acting as a shared memory.
We have discussed this sub-project in class so the main concepts involved with this project are known from the class
and hence what follows is just a short list of requirements.
You are given an existing multi-stage design from 2008 that you should modify such that:
- The outcome module is completely combinatorial, i.e., no use of clock or registers.
This implies that you should eliminate from the switch-module all the logic that refers to the
the register that holds delayed packets.
- Note that in the test module the next address in generated only when the previous memory request is cleared
(e.g., sout[0] == 0).
- The test in the test module should be modified such that it will work for a general predefined number of processors differ than the built-in four processors
(c0,c1,c2.c3) used.
- The test module should include more types of addresses, if possibly random addresses or random selection from a fixed set rows.
- Note that the in the switch module that is given there is no internal register to hold delayed packets.
Instead there is a ``pref'' mechanism that you should turn into a wire parameter passed to each switch.
NOTE-1: the current sizes of the BF network have been reduced to k=2 and lvls=3 so that it will pass synthesis quickly
SECOND part: CONNECT SEVERAL CORES with the shared memory module created in the first part
In this part we create a small multicore system by connecting several cores to the shared memory created in the first part.
- Select a simple application like matrix multiplication or transitive closue as described in class.
- Implement seperate memory modules containing the instructions for each core so that all cores' runs are independent
of each other. Similarly assume (and implement) that input data is stored in the shared memory at specific addresses.
- First try to run one core executing your application code and see how it works.
- Next connect several cores to the shared memory unit in such a way that each core is blocked until the SM complete all the memory operations
issued by that core in the last clock cycle. Assume that each core has a aseperate clock so that all cores are working in an asynchronous mode.
Not that this imply that a memory request that failed to complete is retried until it succeeds.
Thus a failier of a load/store can block only the core that issued this request.
- Instructions how to execute pacoblaze program
- PACOBLAZE a clone of PICOBLAZE (for projects that usexs picoblaze cpu)
- PACOBLAZE testbench