Proj 2011 Butterfly FFT (deadline end of the semester vecation)
Psudo C-code as to how to create a butterfly network
Source files for the project (need a password)
Guide to picoblazeSM)
Picoblaze compiler)
- Extend the Butterfly network (BN) in the multi-core project to execute 1D-FFT using the BN-FFT algorithm.
- This is a well known algorithm (learned in BSc second year where each switch in the BN perform
switch operation
- Extend the set of the Picoblaze instructions to include two FFT commands FFTput and FFTget.
The FFTput instruction loads an FFT input value to the FFT-BN and the FFTget instruction returns the suitable output
from the FFT-BN to a designated register in each core. The FFT-BN will work on a new set of inputs only after all
cores executed FFTget and retrieved the outputs of the prevouse FFT computation. Thus values from diffrent FFT computations
should not be mixed. Thus, a core can not issue another FFTput before all other cores have executed FFTget of the prevous
FFT computation. Thus a core may be blocked on FFTput if not all cores completed their FFTget.
- Note that FFT use complex real numbers so either use floating-point/fix-point representation including: conversion from integers
to real numbers and back and real mull and add (in every switch).
You can also use other solutions such as scaling to integers provided that you get an approval to use it.
- BN in wikipedia
- The FFT coefficient are real number constants should be generated by a generic c-program and included in the Verilog modules.
- Do the project in stages.
- Stage-0: write the C-program that generate a verilog 2D-array of unity-roots (following the $i=level,j=switch_position$
of the BN) for any given value of n. At a later stage extend this to a 3D array that will allow using FFT of $m > n$ values
over a smaller set of $n$ cores.
- Stage-1: Verify that the BN of the given multi-core is in the correct shape for FFT (otherwise you need to modify it).
- Stage-2: Duplicate the BN scaffold (the generate-for and switches) to create a seperate FFT-BN network and meke every switch
in that network to compute an integer-FFT operation.
Write a test module just to check that this FFT-BN works stand-alone with integer values.
Note that the current BN is inverted compared to what you need so use the second BN in the SharedMemory
or modify the indexes of the wires as needed.
- Stage-3: Combine the stand-alone FFT-BN with the picoblaze cores such that it will automaticaly perform this simplified
integer FFT on a set of regiters from each core and storing the output a another set of registers.
- Stage-4: Extended the picoblaze assembler with FFT (start-FFT, get_fft_value,...) instructions
allowing a program to activate FFT compuattion and use the results.
write and test a small assembler program to see that this stage works o.k.
- Stage-5: Extend the FFT-BN to work with real-numbers (Fix/floating point operations)
see by how-musch this is slowing the network.
The main problem is how to initialize with real numbers,
Though you can use real numbers in Verilog it is not synthesizable on ISE
you need to include a suitable library (or write the mull and add routines that are needed yourself).
I assume that the best way is that each real number/constant will be represented
by two integers values (exponent and Mantissa +sign) and that each switch will do the
its floating-point arithmetic operations on these numbers).
- Stage-6: Add integer-to-real/real-to-integer conversion befor/after the FFT-BN is used (eiter in hardware or as nstructions).
- Stage-7: Write an assembler program to demonstrate the use of you FFT. Special bunus for multiplying
big-numbers (requires inverse FFT)
- Note that the FFT will work slower than the regular load/store-BN thus the output of the FFT will be produced after several clock
cycles wherein each core can do other usefull computations. Since each load/store can delay some of the cores then
a get_fft_value instruction should be devise that will halt the core until its FFT value is ready.
You may need to consider how to synchoronize the cores to make sure that all cores have extracted their FFT-values and
it is safe to start a new FFT.
- It is not hard to see that the FFT-BN can simulate larger $n=2^k$ FFTs than the current number of cores, this may require
repeated operation of the FFT-BN with a different (next) set of unity-roots. Thus you generic C program should create a sutable
ayyay of unity-roots to be used.
- you can find floating/fix point circuits in this link and even complete BN-FFT which may be of help
Floating point unit in opencores, COPROCESSOR section, project called FPU.
link
It has all the needed calculation separately for exponent and significand, e.t.c.
- some additional material regarding FFT and multiplying complex number can be found here
- Instructions how to execute pacoblaze program
- PACOBLAZE a clone of PICOBLAZE (for projects that uses picoblaze cpu)
- PACOBLAZE testbench
- presentation on FFT
- presentation on FFT