

# Solution to examination in Integrated Circuit Design MCC091

Thursday October 29, 2015 Version 1.0

Changes since version 0.91. Solution to problem 1.b was added. Solution to task 3 was updated to use  $V_{DD} = 1.2$  V rather than 1.0 V. Also a calculation mistake in the solution to task 4a has been corrected.

## 1) Layout and logic functions

- a)  $c2out = p2 + c2(c1 + p1)$
- b) Layout is shown below. Other solutions are also possible.



## 2) Amplifiers and transistor characteristics

- a)  $V_B$  is forced by  $I_B$  from the current source, to the voltage  $V_{DD} - V_{GS3}$ , where  $V_{GS3}$  is the voltage that gives the current  $3\mu A$  (while also  $V_{DS3}$  for  $M3$  is the same as  $V_{GS3}$ ). From the black lines added to diagram B, this gives us  $V_{GS3} = -0.65$  V and thus  $V_B = 0.55$  V.  $V_{IN}$  has to be set to the voltage that gives also  $M1$  the current  $3 \mu A$ . From diagram B we find that it is a  $V_{GS1}$  slightly higher than 0.6 V, approximately  $V_{IN} = 0.605$  V.
- b) Transistor  $M_2$  (the active load) is biased on the “load line”  $V_{GS2} = -0.65$  V. The points where this line intersects the nMOS curves are marked by crosses in diagram B. The two red crosses show the steep part where both transistors are in saturation. From these two points we find that the gain is  $|A_V|$  is approximately  $(0.79 - 0.24)/(0.625 - 0.6) = 22$ .
- c) The small-signal diagram is:



The expression for the gain is  $|A_V| = g_{m1} / (g_{d1} + g_{d2})$ .

- d) The expression for the gain derived in task c) is independent of the bias current. We have  $g_m \approx 2I_D/V_{GT}$ , where  $V_{GT}$  is short for  $V_{GS} - V_T$ , and  $g_d = I_D/V_A$  where  $V_A$  is the Early voltage. The gain is then  $|A_V| = V_A/V_{GT}$ . So under the assumption that  $V_A$  remains the same only  $V_{GT}$  will change when the

current is decreased. Lower current implies lower  $V_{GS1}$  and thus an **increased** gain. From diagram B we find (blue line) that for the drain current  $1.5 \mu\text{A}$   $V_{GS1}$  has to be  $0.525 \text{ V}$ . The decrease in  $V_{GT}$  is then from  $0.305$  to  $0.252$ . The increase in gain in percent gain is  $100 * (1/0.225 - 1/0.305)/1/0.305 = 100 * (0.305/0.252 - 1) = 21 \%$ . So we get an **increase of around 20%**.



3) **Dynamic and static power** Note! This solution has been updated. The problem states  $V_{DD}=1.2 \text{ V}$ , but the previous solution used  $V_{DD}=1.0 \text{ V}$  which was the intended  $V_{DD}$ .

- (This problem is also an example in Weste & Harris) The capacitance for the logic part is  $50 \text{ million transistors} * 0.3 \text{ um} * 1.8 \text{ fF/um} = 27 \text{ nF}$  and for the memories  $950 \text{ million} * 0.1 \text{ um} * 1.8 \text{ fF/um} = 171 \text{ nF}$ . The power consumption for the logic part is then  $0.1 * 1 \text{ [GHz]} * 1.2^2 \text{ [V}^2\text{]} * 27 \text{ [nF]} = 3.88 \text{ W}$ . The dynamic power for the memories is :  $0.02 * 1 \text{ [GHz]} * 1.2^2 \text{ [V}^2\text{]} * 171 \text{ [nF]} = 4.92 \text{ W}$ . All in all the dynamic power for the chip is **8.8 W**.
- $(0.9*25*10+0.1*25*100+25*5)*0.3= 200 \text{ mA}$ . (This problem is also an example in Weste & Harris) The static power is due to the subthreshold leakage and gate leakage. We assume that half of all transistors are on and half are off. Only off transistors contribute subthreshold leakage and only on transistors contribute gate leakage current. For the memory part we have the total leakage current:  $425 \text{ million transistors} * 0.1 \text{ um} * 10 \text{ nA/um} + 425 \text{ million transistors} 0.1 \text{ um} * 5 \text{nA}$ , that is  $425 \text{ million} * 0.1 \text{ um} * 15 \text{ nA/um} = 637.5 \text{ mA}$  of leakage current. For the logic part we have 25 million transistors that are on, and 25 that are off. Of the off ones 5% have the high VT. So the leakage current is  $(0.95*25*10+0.05*25*100+25*5)*0.3= 146.25 \text{ mA}$ . All in all the leakage current is  $783.75 \text{ mA}$  and the power (since  $P=U*I$ ) is **940 mW**. (Note that this result is not the same as in the book because there is a calculation error in book solution).
- The dynamic power for the logic part would increase by 20 % since the capacitance increase by 20% and all the other factors stay the same. The leakage current would be  $((0.99*25*5)*10+0.01*25*100+30*5)*0.3= 270 \text{ mA}$ . So it would not save any dynamic power because the added 5 million transistors have more leakage than what we save by having fewer low VT transistors.
- The dynamic power of  $3.88 \text{ W}$  from a) corresponds to  $3.23 \text{ A}$  of current. When this current is drawn through the power-gate switch there should be no more than  $50 \text{ mV}$  of voltage drop across it. Using Ohm's law, we find the maximum resistance of  $R = 0.06 \text{ [V]} / 3.23 \text{ [A]} = 0.0186 \Omega$ . So the transistor has to be very wide!  $W = 2000 \text{ [\Omega.um]} / 0.0186[\Omega] = 107526 \text{ \mu m} = 108 \text{ mm}$ . So the transistor is around 11

cm wide! (In practice it can be a bit less wide since the transistor resistance at low  $V_{DS}$  is smaller than  $R$ ).

e) The capacitance of the switch is  $W * 1 \text{ fF}/\mu\text{m}$ ; that results in  $107526 \text{ fF}$  or  $107.5 \text{ nF}$ . The energy is then  $155 \text{ nJ}$  (since  $E = CV_{DD}^2$  and  $V_{DD}$  is  $1.2 \text{ V}$ ). The static power for the logic part from b) is  $146 \text{ mA} * 1.44 \text{ V}^2 = 210 \text{ mW}$ . Power is energy per time. So how long time for the total energy due to leakage to be equal to  $E_{sw}$ ? We get  $E_{sw} = E_{\text{leak}} = P_{\text{leak}} * t$  so  $t = E_{sw}/P_{\text{leak}} = 186 \text{ [nJ]}/210 \text{ [mJ/s]}$  (since Watts are Joules/second). We get  $t = 0.89 \mu\text{s}$ .

### BONUS QUESTION

f) Obviously the case in c) is not good since both the dynamic power and the leakage is higher than in b), but in the general case it is a question about determining when to spend all the energy to turn the power off. One has to be rather good at predicting the down-time to spend the energy required. So it may be better to spend more on the dynamic power if the static power can be reduced without having to turn the power off, since it is very costly to do so.

#### 4) Wire delay, wire and inverter delay



a) There are two parts in this circuit as is shown in the figure above. For the second part we use a collapsed tree in this solution, but one can also use Elmore branches. Elmore for first part:  $100 \Omega * (72 + 100 + 100 + 36)\text{fF} + 800 \Omega * (100 + 36)\text{fF} = 139.6 \text{ ps}$ . Elmore for the second part:  $200 \Omega * (36+100+100+36)\text{fF} + 200\Omega * (100 + 36) \text{ fF} = 3 * 200 * 136 \text{ fF} = 81.6 \text{ ps}$ . All in all the delay becomes  $td = 0.7 * (139.6 + 81.6) = 155 \text{ ps}$ .

b) We know that for the 2-input NAND gate we have  $g_{\text{NAND}}$  is  $4/3$  and  $p_{\text{NAND}}$  is  $2$  (otherwise we could derive these numbers). The path effort from A to B is then  $F = G * H = (4/3) * 72 / 1.5 = 64$ . For minimum delay all stage efforts should be the same. In this case all stage efforts,  $f$ , should be  $4$  since  $4 * 4 * 4 = 64$ . To find the inverter sizes we can start from the output or the input of the path. From the input we have  $4/3 * h_{\text{NAND}} = 4 \Rightarrow h_{\text{NAND}} = 3$  so the input capacitance of the first inverter in the buffer should be **4.5 fF**. The input of the second inverter should  $4 * 4.5 \text{ fF} = 18 \text{ fF}$ . These capacitances correspond to drive strengths  $200 / 4 = 50\text{X}$  for the second inverter and  $50/4 = 12.5\text{X}$  for the first inverter.

c) The resulting normalized delay  $d$  is  $(4+4+4+P)\tau$  where  $P$  is the sum of the parasitic delays for the three gates. We have  $P = 2 + 1 + 1 = 4$ . Here we use our prior knowledge that the 2-input NAND gate has  $p=2$ , but we could also derive it, if we did not remember. Thus, we have  $d = 16\tau$ . But what is  $\tau$  in this process? We had better check that too. It is  $0.7 * R * C = 0.7 * 72 \text{ fF} * 0.1 \text{ k}\Omega = 5 \text{ ps}$ . So, the delay from A to B is  $16 * 5 \text{ ps} = 80 \text{ ps}$ .

**COMMENT:** Is the total delay from A to C now  $80 \text{ ps} + 200 \text{ ps} = 280 \text{ ps}$  or is there some delay we have not accounted for, since we assumed infinite drive strength at point B in task a)? In our path delay calculation in b) we accounted for the electrical effort of the second buffer so we have accounted for that delay and therefore it is correct to assume that the total delay is the sum of the two delays.

### 5) Critical timing paths, logical effort



**Table 1**

| INVERTER DATA                                | X2     | X4     | X7     | X13    | X27    | X53    | X106   |
|----------------------------------------------|--------|--------|--------|--------|--------|--------|--------|
| Input capacitance [fF]                       | 1      | 1.3    | 1.6    | 3.3    | 5.8    | 12     | 25.1   |
| KLOAD = $0.7R_{\text{eff}} [\text{k}\Omega]$ | 4.1251 | 2.3529 | 1.6004 | 0.7612 | 0.4298 | 0.2101 | 0.1021 |
| Intrinsic delay                              | 13.2   | 11.9   | 10.5   | 10     | 9.9    | 10.1   | 10.6   |

Note: Unit for intrinsic delay is ps.

**Table 2**

| 2-input NAND data                            | X2     | X7     | X21    | X57    |
|----------------------------------------------|--------|--------|--------|--------|
| Input capacitance [fF]                       | 1      | 2      | 5.8    | 16.3   |
| KLOAD = $0.7R_{\text{eff}} [\text{k}\Omega]$ | 6.9746 | 2.2207 | 0.6572 | 0.2312 |
| Intrinsic delay                              | 17.6   | 15.9   | 14.9   | 14.6   |

Note: Unit for intrinsic delay is ps.

a) **The FO4 delay is 20 ps.** We use the data from Table 1 inverters of size X13-X106 since for these the parasitics is almost constant. FO4 is: intrinsic delay +  $4 \cdot C_{\text{INPUT}} \cdot K_{\text{LOAD}}$ . To be clear I make a table here (which is not required in the student solutions):

| Size | Input cap * KLOAD [ps] | Parasitic delay [ps] | FO4 delay [ps] | Pinv (intrinsic delay/tau) |
|------|------------------------|----------------------|----------------|----------------------------|
| X13  | 2.511                  | 10                   | 20.04          | 3.98                       |
| X27  | 2.493                  | 9.9                  | 19.87          | 3.97                       |
| X53  | 2.521                  | 10.1                 | 20.18          | 4.01                       |
| X106 | 2.560                  | 10.6                 | 20.84          | 4.14                       |

We see that the FO4 delay is 20 ps with  $\tau = 2.5 \text{ ps}$  and  $\text{Pinv} = 4$  (which is very high!).

b) **The logical effort, gNAND, is 1.5:** We use data from Table 2, sizes X7 – X57. The logical effort,  $g_{\text{NAND}}$ , is  $C_{\text{INNAND}} \cdot K_{\text{LOADNAND}} / \tau$ . To be clear, I again make a table here (which is not required in the student solutions):

| Size | LOAD-dependent delay = Input cap * KLOAD [ps] | Parasitic delay [ps] | $g_{\text{NAND}} = \text{load-dependent delay}/\tau$ | $p_{\text{NAND}} = (\text{intrinsic delay}/\tau)$ |
|------|-----------------------------------------------|----------------------|------------------------------------------------------|---------------------------------------------------|
| X7   | 4.441                                         | 15.9                 | 1.77                                                 | 6.36                                              |
| X21  | 3.812                                         | 14.9                 | 1.52                                                 | 5.96                                              |
| X57  | 3.768                                         | 14.6                 | 1.51                                                 | 5.84                                              |

It looks like we should also discount the data from the X7 NAND. If we only use the data from X13 and X57 we get:  $g_{\text{NAND}} = 1.5$ .

c) **The parasitic delay, pNAND, is 6.** The parasitic delay is defined as the intrinsic delay/tau. pNAND is also listed in the table above. Again we discard the X7 data. The intrinsic delay is around 15 ps and tau is 2.5 ps so we get pNAND = 6 (also see table above). This value for p is very high, since what we have used theoretically is p = 2.

**COMMENT:** We can note, however, that the inverter also has a very high p. If we express pNAND as  $p * \text{pinv}$  we get  $1.5 * \text{pinv}$  which is reasonable for a good layout, where there are two pMOS transistors sharing one diffusion area.

d) We first determine the electrical efforts: We have  $h1 = 2/1 = 2$ ,  $h2 = 5.8/2 = 2.9$ ,  $h3 = 16.3/5.8 = 2.8$  and  $h4 = 50/16.3 = 3.1$ . Thus, we have  $h_{\text{tot}} = h1 + h2 + h3 + h4 = 10.8$

All in all we have:  $d = g_{\text{NAND}} * h_{\text{tot}} + 4 * p_{\text{NAND}}$ .

With our extracted model,  $g_{\text{NAND}} = 1.5$  and  $p_{\text{NAND}} = 6$ , we get: **d = 40.2**.

With our theoretical numbers,  $g_{\text{NAND}} = 4/3$ ,  $p_{\text{NAND}} = 2 * \text{pinv}$  and  $\text{pinv} = 1$ , we get **d = 22.4**

**COMMENT:** With our theoretical numbers,  $g_{\text{NAND}} = 4/3$ ,  $p_{\text{NAND}} = 2 * \text{pinv}$ , but with  $\text{pinv} = 4$ , we get  $d = 48.4$  which is much closer.

## 6) Adders

### PART A

a) If the group propagate signal for the carry-skip block (P7:0, cell D13) is 1 the multiplexer should select the carry-in to the block (CIN, cell S13), if it is 0 it should select the carry out generated from the block (G7:0, cell C13). The function is then:

$$C_{OUT} = P_{7:0} \cdot C_{IN} + \overline{P_{7:0}} \cdot G_{7:0}$$

It is also OK to give the excel expression.

b) The logic cells are one AO gate for the group generate and one AND gate for group propagate. Here is a figure from Weste and Harris:



### PART B

c) Ladner-Fischer diagram from Weste & Harris book.

