This is a short guide for quick use of the system. a- Extract the executable "t" and the examples to b- Execute ./t -f t.t c- You will see the following screen with the source code containing two sequential loops. c- Note that each loop has a loop carried dependency A(i) <--> A(i-3) and B(j) <--> B(j-3) thus can not be parallelized 1: Entry 1: inout integer A(0:100) 2: inout integer B(0:100) 3: in integer n, m 4: inout integer t1, t2, t3, t4, t5, t6 5: in integer i, j 6: for i = 3,n do 7: t1 = A(i-3) 8: t2 = t1*t1+t1 9: A(i) = t2 6: endfor 11: for j = 3,m do 12: t3 = B(j-3) 13: t4 = t3*t3+t3 14: B(j) = t4 11: endfor 16: Exit Parsed t.t, 46 dependences, 0 others filtered *Browse File Parse CalcDD Restor System Trans Write Msgs Quit d- Press B then L then R then N then F to browse to the first loop and apply Loop fusion e- You will get the following screen showing the fused loop 1: Entry 1: inout integer A(0:100) 2: inout integer B(0:100) 3: in integer n 4: inout integer t1, t2, t3, t4 5: for i = 3,n do 10: t1 = A(i-3) 11: t2 = t1*t1+t1 12: A(i) = t2 13: t3 = B(i-3) 14: t4 = t3*t3+t3 15: B(i) = t4 5: endfor 21: Exit anti 15: t3 --> 14: t3 (+) Brwse COpts DD Loop Optns *Restr See Undo Var Msgs Xcape f- Press B then L then R then N then S to apply source level modulo scheduling g- The result is a new loop that can be parallelized as indicated by the Par-Block syntax. 1: Entry 1: in integer reg0 2: inout integer A(0:100) 3: inout integer B(0:100) 4: in integer n 5: inout integer t1, t2, t3, t4 6: !/Start SLMS Loop/ 7: !/Par Block/ 8: t1 = A(3-3) 9: !/Par Block/ 10: reg0 = t1*t1+t1 11: !/Par Block/ 12: t2 = reg0 13: t1 = A(3+1-3) 14: !/Par Block/ 15: A(3) = t2 16: reg0 = t1*t1+t1 17: !/Par Block/ 18: t3 = B(3-3) 19: t2 = reg0 20: t1 = A(3+2-3) 21: for i = 3,n-3 do 22: !/Par Block/ 23: t4 = t3*t3+t3 24: A(i+1) = t2 25: reg0 = t1*t1+t1 26: !/Par Block/ 27: B(i) = t4 28: t3 = B(i+1-3) 29: t2 = reg0 30: t1 = A(i+3-3) 21: endfor 33: !/Par Block/ 34: t4 = t3*t3+t3 35: A(i) = t2 36: reg0 = t1*t1+t1 MII = 2, performed SLMS successfully Brwse COpts DD Loop Optns *Restr See Undo Var Msgs Xcape h- Press Q to exit Htiny and you have the synthesis result (Verilog) in a file SLMS_v.t . h- This file contains a state machine for executing the original loop in one clock-cycle per one loop-iteration. h- In addition, the file contains a suitable memory module and a test module. h- Following is the main part of SLMS_v.t and as can be seen it also apply re-use of arithmetic operations. assign add_s_port_0 = counter==6 ? A+i+1 : counter==7 ? B+i : add_s_port_0; assign w_t4 = counter==6 ? w_Mul1_Res+val_l_port_1 : t4; assign w_t1 = counter==6 ? val_l_port_2 : t1; assign w_reg0 = counter==6 ? w_Mul2_Res+val_l_port_2 : reg0; assign w_counter = first_load==1 ? 6 : counter==7 && i < n-3 ? 6 : 8; assign add_l_port_1 = counter==7 ? B+i+1-3 : add_l_port_1; assign w_t3 = counter==6 ? val_l_port_1 : t3; assign w_i = first_load==1 ? 3 : counter==7 ? i + 1 : i; assign add_l_port_2 = counter==7 ? A+i+3-3 : add_l_port_2; assign w_t2 = counter==7 ? reg0 : t2; assign val_s_port_0 = counter==6 ? t2 : counter==7 ? t4 : val_s_port_0; assign w_Mul1_Right = counter==6 ? val_l_port_1 : 0; assign w_Mul2_Left = counter==6 ? val_l_port_2 : 0; assign w_Mul1_Res = w_Mul1_Left*w_Mul1_Right; assign w_Mul1_Left = counter==6 ? val_l_port_1 : 0; assign w_Mul2_Right = counter==6 ? val_l_port_2 : 0; assign w_Mul2_Res = w_Mul2_Left*w_Mul2_Right; always @(posedge clkk) begin t4 <= w_t4; t1 <= w_t1; reg0 <= w_reg0; counter <= w_counter; t3 <= w_t3; i <= w_i; t2 <= w_t2; end endmodule i- There are many new transformations added to Htiny (loop-unrolling, loop-fusion, if-conversion, accumulator expansion) all available in the NEW REST menu. i- A different form of synthesis using List-Scheduling rather than Modulo-Scheduling can be activated as well. i- Use the Trans option at the main menu the H the B. i- This may be problematic since you need to open a suitable sub-directory. i- In this form you can synthesize a set of loops