===================================
= Peer Review Feedback 26/09-2019 =
===================================

Hi!
I enjoyed having a look at your designs and picked up a few useful techniques!
It also helped me turn an extra critical eye to my own designs.
I think you structure your designs clearly and in an organized fashion, which makes them easy to follow and the signals relatively easy to trace.
This also makes it easier to extend and reuse design code.
I also appreciate your use of FSMs. This makes it much easier to break up the design into sequential stages.
I do however think you should comment more, especially on the intended behaviour of the design and the "big view", in Lars' words. (this mainly applies to the serial MAC)
A brief summary or demonstration of the intended function of each structure (or even the whole design) helps where things are not clear from context. (eg. how many cycles to complete a computation? when are combinational processes triggered? what are the various states intended to do?)

As you can see, I have a few comments 
I've tried to divide them up into different types of critique.
I most likely have a bit more experience in the language than most of the class, so forgive me if I may have gone a bit overboard. :]

Best of luck in the remaining labs!
--JC




=====================================================================================================
Direct MAC
==========
Style
--------
* Constants are more easily identifiable by CAPITAL names (or something else to show they are not assignable).
* "tap" could have been declared a constant array with initially assigned values, and replaced "params". 
* Interesting use of an "empty" constant, but rather defeats the purpose of unconstrained aggregate assignments (ie. others), if you then constrain it to a constant size.
   To clear an entire array you can use (others => (others => '0')), which is indirectly what you've written in L78. In L61, empty(0) essentially becomes constant '0', which strikes me as unnecesary. 
   By it's naming, it is also not clear that it is a constant, which I've mentioned previously. At first, I thought it might be a VHDL keyword!
* In the case structure, in each state you don't need to explicitly retain the old state (ie. L85). It may look like latch behaviour to only have one assignment, but since the process is clocked, a latch will not be inferred. It's good to be cautious, but it becomes rather hard to read. (I now see you removed this in the serial MAC).
* "test" signal not annotated or used. Will be removed by compiler, but may be confusing to human readers.
* "delay_s": good naming scheme appending the "_s", though not clear that it is an array (especially in L78, where "others=>empty" implies a vector to me, unless I explicitly check the definition of "empty". See previous critique.).
* "n2bit_array": 2*n bits, or n*2-bit :) I understand that names cannot begin with a number, but this is somewhat unintuitive.
* L76 (delay_s <= delay_s;) is unnecesary as far as I know. As previously mentioned, it is fine with a single assignment within a clocked process. No need to explicitly retain old values, though some might find it easier to understand as a "default" value that gets conditionally overwritten. 
* "state_machine" is a rather ambiguous name. Simply "state" is much less confusing.

Edge cases
----------
* You have no exception case in your FSM! Remember, 0/1 are not the only values of std_logic, you also have X,U,Z,H,L etc... that you must account for. Otherwise you may get unpredictable behaviour.
  It might look like:
   when others => state_machine <= IDLE; -- reset on exception
* You don't reset "state machine" on reset.  If in "init" state and reset, it will remain in "init".  


(Potential) Improvements
------------------------
* Your first two GENERATE statements use the same indicies and could therefore be combined. 
   You could then merge the contained statements into a single statement (thereby eliminating "mult_res_s")
      ie. mult_res_cor_s(i) <= (delay_s(i) * tap(i))(2*WIDTH-2 downto 0) & empty(0);
   or keep them separate if the intermediate value is used elsewhere.
* Your third GENERATE (L68) uses a subset of the indicies of the previous two. You could therefore use an IF-statement inside a single GENERATE to cover all three. 
* numeric_std includes a SHIFT_LEFT() function. Nothing wrong with your method, but a function makes the intent clearer (and imo is generally less error prone). That said, the function does not seem to cover arrays, so your method is appropriate.
* There is no "big view" explanation. A reader may not know what algorithm/behaviour this implements. A brief "black-box" description is important in verifying the design and/or using it as a component.



=====================================================================================================
Serial MAC
==========
Style
--------
* Constants "one", "zero" and "empty" are puzzling (L43). I see no advantage as they are not clearer or more compact. Furthermore, since VHDL is strongly typed, you can never assume that a signal is going to be compatible. A direct, literal assignment automatically adapts to the type of the assigned signal; a typed constant does not. 
   Hiding complexity is obviously a tradeoff, but I think it is unproductive in this case.
* "y_int_s" and "finished_int_s" not annotated. They are also redundant as you've shorted them to their corresponding output ports. Also, they are not in the reset clause, meaning that outputs are not cleared on reset. This may be intended, however, you haven't specified.
* Signal initial assignments (eg. L46) are discouraged as they are non-synthesizable.
* Not really a criticism: VHDL integers are by default declared as 32-bit. To save hardware, they can be constrained to an appropriate size. eg. "signal cnt:INTEGER range 0 to N-1;" Sometimes the synthesizer does this, other times not :/ 
   It may also help you to spot unintended behaviour if you set your valid data range at design-time. (eg. an N-tap filter should never count to N+1, therefore the counter can be constrained to this range. The simulator will then throw an error if the bounds are exceeded.).

"Big View"
----------
* You don't wait for "start" to be deasserted before changing state/starting the computation. I think this is a requirement. (also, you had the correct behaviour in the previous design..).
* The purpose of the serial MAC is to trade-off speed for lower hardware usage. You have kept all the arrays from the direct MAC though, yielding no improvement in area (unless the synthesizer optimizes the design, which I am certain it does). The only signals that need arrays are "delay_s" and the taps, according to the schematic in the labPM.
* The accumulator/adder (L66) itself is very confusing. You are storing the partial sum in a new register each iteration and using the result of the previous, adjacent register. One register would have sufficed and reduced complexity significantly.
* The arrays are also quite confusing when trying to trace signals. In the concurrent statements below (L61-67), the indexed arrays "mult_res_s(counter_s)" and "mult_res_cor_s(counter_s)" are common factors and simply constitute a direct link between the statements. They can be replaced by any other signal of the base element type, which the compiler probably does as part of it's optimization. 

   This is the original:
   
   mult_res_s(counter_s) <= (delay_s(counter_s) * tap(counter_s));
   mult_res_cor_s(counter_s) <= mult_res_s(counter_s)(2*WIDTH-2 downto 0) & empty(0);
   add_res_s(counter_s) <= mult_res_cor_s(counter_s) WHEN (counter_s = N-1) ELSE
                           add_res_s(counter_s+1) + mult_res_cor_s(counter_s);
                           
   
   The following will boil down to the same thing, where x1 and x2 are of vector type SIGNED(2*WIDTH-1 downto 0); :
   
   x1  <= delay_s(counter_s) * tap(counter_s);
   x2  <= SHIFT_LEFT(x1, 1);
   res(counter_s) <= x2 when   ...   etc...                   

   Assigning a value concurrently to a signal indexed by a clocked signal (ie. add_res_s(counter_s) ) is also a problem (as well as conceptually confusing), which I get into below.

Edge cases
----------
* No exception case in FSM.

(Potential) Problems (& Improvements)
------------------------
* Retained values (L74-76) in process not inside rising_edge(clk) and will be triggered by any signals in sensitivity list on any change. Not exactly a problem here, however, but synchronous signals should of course not be triggered by anything other than the clock.
* Reset clause is missing registers. "state_machine" in particular is not cleared. This means the FSM will keep on going despite a reset (though output will be zero)!

* The accumulator "add_res_s" must be sequential in some respect, by definition. However, it is written combinationally and not explicitly clocked. It is indirectly "clocked" by an update on "counter_s", but this results in a latch, as "counter_s" is not updated on every clock cycle (only in EXEC state). I synthesized the design in Quartus to confirm this. Latches are known to cause bugs in clocked designs.
   "mult_res_s" and "mult_res_cor_s" probably also technically result in latches but, as I explained in "Big View", they were instead completely optimized away.
   I would suggest moving the accumulator "add_res_s" inside the clocked part of the process, which should force the use of flip-flops. This may also make clearer its intended use as a register.
   
* No "big-view"/"black-box" explanation. 
   
   
=====================================================================================================