Proj 2011 Butterfly FFT (deadline end of the semester vecation)

Extend the Butterfly network (BN) in the multi-core project to execute 1D-FFT using the BN-FFT algorithm.
This is a well known algorithm (learned in BSc second year where each switch in the BN perform switch operation
Extend the set of the Picoblaze instructions to include two FFT commands FFTput and FFTget. The FFTput instruction loads an FFT input value to the FFT-BN and the FFTget instruction returns the suitable output from the FFT-BN to a designated register in each core. The FFT-BN will work on a new set of inputs only after all cores executed FFTget and retrieved the outputs of the prevouse FFT computation. Thus values from diffrent FFT computations should not be mixed. Thus, a core can not issue another FFTput before all other cores have executed FFTget of the prevous FFT computation. Thus a core may be blocked on FFTput if not all cores completed their FFTget.
Note that FFT use complex real numbers so either use floating-point/fix-point representation including: conversion from integers to real numbers and back and real mull and add (in every switch). You can also use other solutions such as scaling to integers provided that you get an approval to use it.
BN in wikipedia
The FFT coefficient are real number constants should be generated by a generic c-program and included in the Verilog modules.
Do the project in stages.
- Stage-0: write the C-program that generate a verilog 2D-array of unity-roots (following the $i=level,j=switch_position$ of the BN) for any given value of n. At a later stage extend this to a 3D array that will allow using FFT of $m > n$ values over a smaller set of $n$ cores.
- Stage-1: Verify that the BN of the given multi-core is in the correct shape for FFT (otherwise you need to modify it).
- Stage-2: Duplicate the BN scaffold (the generate-for and switches) to create a seperate FFT-BN network and meke every switch in that network to compute an integer-FFT operation. Write a test module just to check that this FFT-BN works stand-alone with integer values. Note that the current BN is inverted compared to what you need so use the second BN in the SharedMemory or modify the indexes of the wires as needed.
- Stage-3: Combine the stand-alone FFT-BN with the picoblaze cores such that it will automaticaly perform this simplified integer FFT on a set of regiters from each core and storing the output a another set of registers.
- Stage-4: Extended the picoblaze assembler with FFT (start-FFT, get_fft_value,...) instructions allowing a program to activate FFT compuattion and use the results. write and test a small assembler program to see that this stage works o.k.
- Stage-5: Extend the FFT-BN to work with real-numbers (Fix/floating point operations) see by how-musch this is slowing the network. The main problem is how to initialize with real numbers, Though you can use real numbers in Verilog it is not synthesizable on ISE you need to include a suitable library (or write the mull and add routines that are needed yourself). I assume that the best way is that each real number/constant will be represented by two integers values (exponent and Mantissa +sign) and that each switch will do the its floating-point arithmetic operations on these numbers).
- Stage-6: Add integer-to-real/real-to-integer conversion befor/after the FFT-BN is used (eiter in hardware or as nstructions).
- Stage-7: Write an assembler program to demonstrate the use of you FFT. Special bunus for multiplying big-numbers (requires inverse FFT)
Note that the FFT will work slower than the regular load/store-BN thus the output of the FFT will be produced after several clock cycles wherein each core can do other usefull computations. Since each load/store can delay some of the cores then a get_fft_value instruction should be devise that will halt the core until its FFT value is ready. You may need to consider how to synchoronize the cores to make sure that all cores have extracted their FFT-values and it is safe to start a new FFT.
It is not hard to see that the FFT-BN can simulate larger $n=2^k$ FFTs than the current number of cores, this may require repeated operation of the FFT-BN with a different (next) set of unity-roots. Thus you generic C program should create a sutable ayyay of unity-roots to be used.
you can find floating/fix point circuits in this link and even complete BN-FFT which may be of help Floating point unit in opencores, COPROCESSOR section, project called FPU. link It has all the needed calculation separately for exponent and significand, e.t.c.
some additional material regarding FFT and multiplying complex number can be found here
Instructions how to execute pacoblaze program
PACOBLAZE a clone of PICOBLAZE (for projects that uses picoblaze cpu)
PACOBLAZE testbench
presentation on FFT
presentation on FFT