

Written examination for

# MCC092 Introduction to Integrated Circuit Design

**Monday August 21, 2017, at 8.30-13.30 at the M building**

---

**Staff on duty:** Lena Peterson, D&IT, phone ext: 1822, or mobile 0706-268907. Lena will visit around 9.30 and 12.00.

**Administration:** Send exams to Lena Peterson CSE dept. Send lists to CSE student administration office.

**Allowed technical aids for students:** This is a closed-book exam. Allowed aids: A Chalmers-allowed calculator (non graph-drawing) plus pencil, eraser, ruler, and dictionary (these are always allowed).

**Results:** The results from the examination will be sent to you via the Ladok system within three weeks. The grading review will take place Friday September 8 2017, 12:15-13:15 in room 4128.

**Solutions:** Solutions will be posted on the course web site in PingPong no later than Tuesday August 22. Any student who does not have access to the 2016 course web site can contact Lena Peterson (via e-mail to lenap@chalmers.se) to obtain the solutions.

**Instructions:**

- Write legibly.
- State any assumptions you make.
- Explain your reasoning and calculations (except when the problem says otherwise). Partial credits can be awarded but if we do not understand you that is not possible.
- Number all pages and write your code on each page.
- Put only one problem per page.

**Good luck!**

---

---

**Grades:**

The written examination contains six problems, each worth 10 points. You need 30 points to pass (grade “3”), at least 40 points for grade “4” and at least 50 points for grade “5”. Bonus points from the fall 2016 course instance will be added before the higher grades are assigned.

---

---

**Problem 1:** MOS technology fundamental operation and scaling for delay and power.

Dennard scaling is (or maybe one should say, was) the fundamental technology scaling behind Moore's law. In Table 1 the Dennard scaling for the primary scaled parameters is given in the top part of the table. Your task is to derive the ten missing entries for the device characteristics parameters in the lower part of the table; that is the ones labelled *(a)* through *(j)*. A *sensitivity expression* is an expression that contains all the factors that depend on the factor for which you trying to determine the sensitivity, in this case the scaling factor  $S$ , but no other factors, such as constants. For the remaining sensitivity expressions you may use the primary parameters from the top half of the table, as well as ones you have already derived. As an example of this approach you see that expression for  $I_{DS}$  includes  $\beta$  which was derived in the row above. The first row is given as an example, and a few other entries are already there.

(10 p)

Table 1: Influence of Dennard scaling with scaling factor  $S$  on MOS device characteristics.

| Parameter                           | Sensitivity expression         | Sensitivity to scaling factor $S$ |
|-------------------------------------|--------------------------------|-----------------------------------|
| Scaling parameters                  |                                |                                   |
| $L$ : transistor length             |                                | $1/S$                             |
| $W$ : transistor width              |                                | $1/S$                             |
| $t_{ox}$ : gate oxide thickness     |                                | $1/S$                             |
| $V_{DD}$ : power supply voltage     |                                | $1/S$                             |
| $V_T$ : threshold voltage           |                                | $1/S$                             |
| $NA$ : substrate doping             |                                | $S$                               |
| Device characteristics              |                                |                                   |
| $\beta$ : transistor current factor | $\frac{W}{L} \frac{1}{t_{ox}}$ | $S$                               |
| $I_{DS}$ : transistor current       | $\beta(V_{DD} - V_T)^2$        | <i>(a)</i>                        |
| $R_{eff}$ : transistor resistance   | <i>(b)</i>                     | <i>(c)</i>                        |
| $C$ : transistor gate capacitance   | <i>(d)</i>                     | <i>(e)</i>                        |
| $\tau$ : gate delay                 | <i>(f)</i>                     | <i>(g)</i>                        |
| $f$ : clock frequency               | <i>(h)</i>                     | $S$                               |
| $P$ : switching power (per gate)    | <i>(i)</i>                     | <i>(j)</i>                        |

**Problem 2:** Optimal sizing of critical path. Logic design.

You are the designer for a new 16-bit processor. Your task is to design an circuit for the data path. It should take two 16-bit numbers as its input and produce a 1-bit output that is "1" if the numbers are and "0" if they are not. Figure 1(a) shows the EQ cell that has been previously designed. In Figure 1(b) you see the setup with two registers that are placed before and after your circuit. You can assume that in this process you are designing a p-transistor gives half the current of a nMOS transistor of the same width, and that  $p_{inv} = 1$ .

- For the EQ gate (that is, the compound gate without the inverter shown in Figure 1(a)) find the logical effort,  $g$ , for the EQi input and the parasitic delay,  $p$ . (2 p)
- Derive the optimal sizing of the cell shown in Figure 1(a), that is size the EQ gate relative to the inverter. Assume that the smallest transistor sizes in the cell correspond to those of an X1 inverter. With this sizing



Figure 1: Equal circuit design problem.

what is the normalized delay for the critical path, from the output of register 1 to the input of register 2? Assume that the capacitance at the input of register 2 corresponds to that of an X4 inverter. (2 p)

- c) You have just been informed that there has been a change of plans: the EQ cell has to also generate an additional output, E, which should be "1" if the A and B cell inputs are equal and "0" otherwise. Refer to Figure 1(c) to see how the modified EQ cell is to be used in the processor datapath. Redesign the EQ cell so that the E output is also generated. Due to size constraints you cannot just add an additional separate gate for E, but have to try to share resources with the existing circuitry. Derive the logic function for E and draw the schematic of the new cell. Also draw a sketch of exactly how your modified cell is to be connected to the registers in position EQ0 and EQ15 as shown in Figure 1(c). (3 p)
- d) Size the critical path of your new EQ2 cell. With these sizes calculate the new critical-path normalized delay from the output of register 1 to the input of register 2. As in task a) assume that the input capacitances of register 2 correspond to the input capacitance of an X4 inverter and the smallest transistor sizes in the cell correspond to those of an X1 inverter. Compare to the normalized delay you found in task a). (3 p)

**Problem 3: Layout**



Figure 2: Two templates for the layout of the original EQ cell shown in Figure 1(a).

- a) Draw the layout for the EQ cell in Figure 1(a) in the template supplied in Figure 2(a), which uses single-

line-of-diffusion layout only for the p-net. The layout should match the schematic in Figure 1(a) exactly. Label the inputs and other nodes clearly. In this task you are not allowed to use the metal-2 layer for routing in the cell. (5 p)

b) Redo the layout also in the templates supplied in Figure 2(b) which uses single-line-of-diffusion also in the n-net. Note that here the order of the inputs in the p- and n-net will have to differ. In this task you are allowed to change the ordering of the transistors from the one given in the schematic if you want to. If you do so, please redraw the schematics with the ordering that corresponds to your layout. For each of the inputs A, A', B and B' there should be only one connection point within the cell, so you need to use poly or metal to form the connections. Label the inputs and other nodes clearly. In this task you are allowed to use the metal-2 layer in the cell if necessary. (5 p)

Both templates are repeated twice at the end of the exam for your convenience. Draw your layouts there and submit one of the two sheets with your solution.

**Problem 4:** Wire delay and energy in RAM bitline



(a) The general layout of the SRAM memory. The word lines (WL) run horizontally in metal-2 (purple) and the bit lines (BL) vertically in metal-1 (blue).

Figure 3: An SRAM memory bank with 256 256-bit words.

In Figure 3(a) you see the general layout of a static random-access memory (SRAM). One of the the word lines (WL) selects the particular word that is to be accessed. The bit lines (BL and BL') carry the bit values (and inverses) out when reading and supply the bit values that are to be written when writing. The bit lines are routed in metal 1 (M1, blue) and the word lines are routed in metal 2 (M2, purple). The six-transistor SRAM cell is shown in Figure 3 (b). Here one can see that in each memory cell the word line is connected to two nMOS transistors for accessing that particular memory cell, and each bit line is connected to the drain/source of one nMOS access transistor.

A bit line is 0.1 μm wide, which is the minimum width for an M1 wire in this particular process. A bit line has a capacitance of 0.1 fF/μm to ground and an inter-wire capacitance to one of its adjacent bit lines of 0.05 fF/μm and the M1 layer has a resistance of 0.1 Ω/□. A minimum-size nMOS transistor has a gate capacitance  $C_g = 0.1 \text{ fF}$  and its drain/source capacitances are half as large.  $V_{DD}$  is 1 V. In the process tao is 5 ps and the nMOS transistors are twice as strong as the pMOS transistors.

a) Calculate the resistance of one bit line. (1 p)

b) Calculate the total capacitance for one bit line including the capacitance of the access transistors. (1 p)

c) What is the delay of only the bit line itself? (2 p)

d) Calculate the energy required for accessing the memory when reading the memory once, which includes discharging one bit line and recharging it to  $V_{DD}$ . (2 p)

e) Draw a worst-case circuit diagram for reading, when one bit line is discharged through the access transistor, M1, which is turned on, and the NMOS transistor of the inverter, M2. These transistors are labelled in Figure 3(b). For simplicity we, unrealistically, assume that all nMOS transistors in the memory cell have minimum width. Also, you should assume that the capacitive load at the far end of the BL due to circuitry that detects the read value, corresponds to that of an X4 inverter in the process. (2 p)

f) Calculate the delay from the model you derived in task d). (2 p)

**Problem 5:** Designing an optimal driver for a large capacitive load.

You are working as a designer for a cell-library company. You have just been assigned the task to design a driver for an output pad and the connected off-chip capacitive load in a 45 nm CMOS process. The total load capacitance, which your driver has to drive, could in the worst case be as high as 10 pF.

Your manager just told you these requirements: The input of the driver is to be the standard inverter in the 45 nm process. The driver is not allowed to invert the signal. The driver should have as short a delay as possible. The manager also gave you the data for the standard inverter in this 45 nm process that you can find in Table 2, and told you that the operating conditions of  $V_{DD}=1.0$  V and  $25^{\circ}\text{C}$  are reasonable to assume.

Table 2: Design and nominal circuit parameters for the standard inverter (FO=3). 45 nm HP models @ 1.0 V,  $25^{\circ}\text{C}$ .<sup>1</sup>

| Parameter               | Description                        | Value                     |
|-------------------------|------------------------------------|---------------------------|
| $W_p$                   | p-FET width                        | $0.60\text{ }\mu\text{m}$ |
| $W_n$                   | n-FET width                        | $0.40\text{ }\mu\text{m}$ |
| $L_{ds}$                | Diffusion region length            | $0.12\text{ }\mu\text{m}$ |
| IDDQ                    | Average off-state leakage current  | $6.47\text{ nA}$          |
| $\tau_{pd}$ (FO=1)      | Average PU and PD delay            | $5.27\text{ ps}$          |
| $\tau_{pd}$ (FO=3)      | Average PU and PD delay            | $9.55\text{ ps}$          |
| $C_{in}$                | Input capacitance                  | $1.50\text{ fF}$          |
| $C_{out}$               | Output capacitance                 | $1.83\text{ fF}$          |
| $R_{sw}$ (FO=3)         | Average switching resistance       | $1.500\Omega$             |
| $C_{sw}$ (FO=3)         | Average switching capacitance      | $6.37\text{ fF}$          |
| $\tau_r, \tau_f$ (FO=3) | Signal rise, fall time             | $\approx 16\text{ ps}$    |
| $E_{sw}$ (FO=3)         | Average energy per switching event | $3.18\text{ fJ}$          |

You need to supply this information about your driver to your manager:

a) The number of inverters in your driver. (2 p)

b) The total gate width of your driver (that is for all the inverters). (2 p)

c) The total delay for your driver. (2 p)

d) The energy per switching event at the input (not including the energy due to the external load capacitance). (2 p)

e) An explanation of how you know that there is no better solution that fulfills the requirements. (2 p)

Bonus task:

f) What is the most important other thing that you would want to tell or ask your manager when you present your design to her? (2 p)

<sup>1</sup>Table is repeated from M. Bhushan, M.B. Ketchen, CMOS Test and Evaluation, DOI 10.1007/978-1-4939-1349-7

**Problem 6:** Faster adders



Figure 4: Two adders. In (a) is an 8-bit adder and in (b) a 4-bit adder.

You have designed an 8-bit adder circuit with eight SUM outputs, one carry output and one block-propagate output, as shown in Figure 4 (a). The propagation delays expressed in unit delays are shown in Table 3. You have now been asked to design a 32-bit adder with as short a propagation delay as possible. Available to you are AO, OA, AND and OR gates which each have a delay of 1 unit delay.

Table 3: Adder propagation delays

| Propagation delay                       | 8-bit adder<br>[unit delays] | 4-bit adder<br>[unit delays] |
|-----------------------------------------|------------------------------|------------------------------|
| From carry in to carry out              | 9                            | 5                            |
| From any data bit to carry out          | 9                            | 5                            |
| From any data bit to block propagate    | 4                            | 2                            |
| From carry in to highest SUM output     | 8                            | 4                            |
| From any data bit to highest SUM output | 8                            | 4                            |

- Draw a diagram of how you would construct a fast 32-bit adder from your 8-bit adder. In addition to the 32 SUM bits your 32-bit adder should also output the block-propagate and block-generate signals for the entire adder. (5 p)
- Derive the propagation delay of your adder as drawn in task a). Assume that all inputs to your adder arrive simultaneously. (3 p)
- What if you also had designed a similar 4-bit adder, as shown in Figure 4 (b), with propagation delays given in the last column of are shown in Table 3. Would it be better to use the 4-bit adder rather than the 8-bit adder in your 32-bit adder? Motivate by drawing a diagram and calculating the delay. (2 p)

MCC092 2017-08-21 Tear-off page for problem 3. Write your anonymous code here:





MCC092 2017-08-21 Tear-off page for problem 3. Write your anonymous code here:





# Solution for written examination for

## MCC092 Introduction to Integrated Circuit Design, August 21, 2017

---

There are still a few things that are to be fixed in the solutions.

### Solution 1: Dennard scaling

The complete table is found as Table 4. Each correct answer gives one point.

Table 4: Table with solution for Dennard scaling.

| Parameter                              | Sensitivity expression                | Dennard , scaling factor $S$ |
|----------------------------------------|---------------------------------------|------------------------------|
| Scaling parameters                     |                                       |                              |
| $L$ : length                           |                                       | $1/S$                        |
| $W$ : width                            |                                       | $1/S$                        |
| $t_{\text{ox}}$ : gate oxide thickness |                                       | $1/S$                        |
| $V_{\text{DD}}$ : power supply voltage |                                       | $1/S$                        |
| $V_T$ : threshold voltage(s)           |                                       | $1/S$                        |
| $NA$ : substrate doping                |                                       | $S$                          |
| Device characteristics                 |                                       |                              |
| $\beta$ : current factor               | $\frac{W}{L} \frac{1}{t_{\text{ox}}}$ | $S$                          |
| $I_{\text{DS}}$ : transistor current   | $\beta(V_{\text{DD}} - V_T)^2$        | $1/S$                        |
| $R_{\text{eff}}$ resistance            | $\frac{V_{\text{DD}}}{I_{\text{DS}}}$ | 1                            |
| $C$ : gate capacitance                 | $\frac{WL}{t_{\text{ox}}}$            | $1/S$                        |
| $\tau$ : gate delay                    | $R_{\text{eff}} C$                    | $1/S$                        |
| $f$ : clock frequency                  | $\frac{1}{\tau}$                      | $S$                          |
| $P$ : switching power (per gate)       | $f C V_{\text{DD}}^2$                 | $1/S^2$                      |

### Solution 2: Equal circuit

a) The solution for the EQi input is  $g = 5/3$  and  $p = 13/3$ . See figure below.



b) The optimal solution is given when the efforts are the same in each stage, That is,  $f_{EQ} = f_{\text{inv}}$ , that is  $g_{EQ} \times \frac{C_{\text{ininv}}}{C_{\text{in}EQ}} = g_{\text{inv}} \times \frac{C_{\text{in}EQ}}{C_{\text{ininv}}}$ . So we see that the EQ gate transistors should be  $\sqrt{5/3}$  times wider than the inverter transistors. The best choice is then to make the inverter minimum available width (which we assume an X1 inverter) and the EQ transistors  $\sqrt{5/3}$  times as wide. See figure below.



The last inverter will have to drive an X4 load so that is a different case. We could of course make all cells larger so that the delay in driving the X4 inverter should be shorter, but it would increase the power consumption quite a lot and decrease the delay only a little so it is not worth it. The normalized delay is then

$$d_p = 31 \times \sqrt{\frac{5}{3}} + 4 + 16 \times \frac{13}{3} + 16 \times 1 = 129.3$$

c) One possible solution is shown below. There are of course other options too. The E signal is first calculated using an XOR gate. This part is similar to the setup cell in the adder cell. Then the EQo' is formed once the EQi signal arrives. As before we need an inverter in between the stages. If we allowed different cells for odd and even bits we could eliminate the inverters in the critical path.



d) Again we should size the ripple path through the stages for minimum delay. In this case the NAND gate should be sized with  $\sqrt{3}/2$  as wide transistors as the inverter. See figure below.



The XOR gate drives both the NAND gate and the register input which corresponds to an X4 input. So we need to take that into account when we scale it. its load is then  $4 + \sqrt{3}/2$  and it should be sized up so that its delay is also  $\frac{2}{\sqrt{3}}$ . The normalized delay is then

$$d_p = 32 \frac{2}{\sqrt{3}} + 4 + 16 \times 2 + 16 \times 1 + 4 = 79.7$$

So the delay is a bit lower for this solution due to the NAND gate in the critical path having two transistors in series rather than three and the parasitic delay therefore being lower. The cost is more complicated layout and two more transistors in each cell.

In practice, the setup part is only a small part of the delay and the final EQo signal will arrive last so it makes no sense to size up the XOR gate too much. One could also use an XNOR gate and generate E' and then add an inverter to generate E. So there are many options here.

### Solution 3: Layout

Possible solutions are shown below. There are of course many other possibilities.





**Solution 4:** Bit line delay in SRAM.

Preliminaries: From the figure we find that length of a bit-line wire is  $1.5 \mu\text{m} \times 256 = 384 \mu\text{m}$ .

a) The resistance of one bit-line wire is:

$$R_w = \frac{384 \mu\text{m}}{0.1 \mu\text{m}} \times 0.1 \Omega/\square = 384 \Omega$$

b) The capacitance of one bit-line wire is:

$$C_w = 384 \mu\text{m} \times (2 \times 0.05 \text{ fF}/\mu\text{m} + 0.1 \text{ fF}/\mu\text{m}) + 256 \times 0.05 \text{ fF} = 89.6 \text{ fF}$$

c) The delay of the bit-line wire itself if we use the pi-model for the wire is:

$$t_{pdw} = 0.7 \times R_w \times \frac{C_w}{2} = 0.7 \times 384 \Omega \times 44.8 \text{ fF} \approx 12 \text{ ps}$$

d) The energy is  $C_w \times V_{DD}^2$  so it is approximately 90 fJ.

e) The bit line is discharged through two minimum-size nMOS transistor, M1 and M2 shown in Figure 3 (b). The model model for the wire can them be drawn like this:

A figure will be added here when I can scan it.

f) The RC part of when the entire bitline is discharged through two minimum nMOS transistor is then:

$$RC_{wn} = 2R_{eff} \times (C_D + C_w + C_L) + R_w \times \left(\frac{C_w}{2} + C_L\right)$$

From the problem we know that a minimum-size nMOS transistor has a drain/source capacitance of 0.05fF. We also know that we have  $\tau = 0.7 \times R_{eff}C = 5 \text{ ps}$  in the process. And since we know from the problem that the minimum-size nMOS transistor has en gate capacitance of 0.1 fF, and thus an inverter 0.3 fF (assuing the pMOS transistor is twice as wide as the nMOS transistor) we can calculate the effective resistance of the inverter as:

$$R_{eff} = \frac{5 \text{ ps}}{0.7 \times 0.3 \text{ fF}} = 23.8 \text{ k}\Omega$$

We thus find

$$RC_{w6T} = 2 \times 23.8 \text{ k}\Omega \times (0.1 + 98.6 + 1.2) \text{ fF} + 384 \Omega \times (44.8 + 1.2) \text{ fF} = 4755 \text{ ps} + 17664 \text{ fs} = 4.78 \text{ ns}$$

and consequently we find

$$t_{pdw6T} = 0.7 \times 4.78 \text{ ns} = 3.34 \text{ ns}$$

So, we can conclude that the transistors in the RAM cell probably cannot be this small if we are to discharge the entire bit line through them. In practice, the transistors are not designed to be equal and furthermore the detection circuit is designed to detect a small difference in voltage between the two bitlines connected to the cell when reading the memory.

**Solution 5:** Optimal driver for large capacitive load

Since the standard inverter has an input capacitance of 1.5 fF we find the electrical path effort as

$$H = \frac{10 \text{ pF}}{1.5 \text{ fF}} = 6666.7$$

We must find an even number of inverters in the driver,  $N$ , such that,  $f^N = H$ , where  $f$  is the tapering factor. We remember (or can derive) that with  $p_{inv} = 0$ , the optimum tapering factor is  $e$ . But in practice  $p_{inv}$  is not 0 and one should use quite a larger tapering factor. In this particular process we find the normalized parasitic delay as:

$$p_{inv} = \frac{1.82 \text{ fF}}{1.5 \text{ fF}} = 1.22$$

- a) The best solution has six inverters. This number of inverters gives a tapering factor of 4.34.
- b) A six-inverter tapered buffer with the same tapering factor in each stage has a total width of

$$W_{tot} = W_{in} \times (1 + f + f^2 + f^3 + f^4 + f^5),$$

where  $W_{in}$  is the width of the first inverter. It can also be written as

$$W_{tot} = W_{in} \times (1 + f(1 + f(1 + f(1 + f(1 + f))))),$$

The standard inverter in the 45-nm process has a total transistor width of 1  $\mu\text{m}$ . Thus, the total transistor width of the tapered buffer is very close to 2000  $\mu\text{m}$ .

- c) The *normalized* delay we can easily calculate as

$$d_p = N \times (f + p_{inv}) = 6 \times (4.34 + 1.22) = 26.04 + 6p_{inv}.$$

However, to give the manager a delay in seconds we need to find  $\tau$  in the 45-nm process. In the table we have both FO1 and FO3 delays. FO1 is not the same as  $\tau$ , however, because it includes also  $p_{inv}$ . We could use the value for  $p_{inv}$  that we calculated above or, maybe better, we could make an equation system to find both  $\tau$  and  $p_{inv}$ . I choose the second approach since it is good to have another check on the parasitics:

$$\begin{aligned} \text{FO1} &= \tau(1 + p_{inv}) \\ \text{FO3} &= \tau(3 + p_{inv}) \end{aligned}$$

When we solve this equation system the result is  $\tau = 2.14 \text{ ps}$  and  $p_{inv} = 1.46$ . This value for  $p_{inv}$  is a bit higher than the one we found from the ratio of the capacitances, but it is not that far off. To be on the safe side we use the higher value when we report our findings to the manager. So our calculated propagation delay is then:

$$t_{pd} = (26.04 + 6 \times 1.46) \times 2.14 \text{ ps} = 74.5 \text{ ps}$$

- d) For the energy calculation we need to find the total capacitance that is charged. There is some uncertainties when we scale up the transistor sizes but we will go with the scaling factor we have used above, which says that the parasitics is 1.46 times the gate capacitance. So the energy for one switching event is then:

$$E_{sw} = 2000 \mu\text{m} \times (1 + 1.46) \times 1.5 \text{ fF} \times 1 \text{ V}^2 = 7.4 \text{ pJ}$$

e) We do not know if the manager is an engineer, but the easiest way to convince her is probably to show her that the closest possible solutions (four inverters, and eight inverters) are quite a bit slower than the proposed solution. Another possibility is to explain how we can derive the optimal fanout but that is probably not a good idea unless she is mathematically inclined.

Add a table with the delays for four, six and eight inverters here.

f) A good question is to ask whether there is any simulation models for the process and if those include any corner models. But there are other good questions too, such as when she wants the design to be done.

### Solution 6:

a) Here is the solution:



An 21AO gate and an AND gate after each 8-bit adder is what is needed to create a carry-skip adder, and generate block propagate and block generate for the resulting 32-bit adder.

b) As hinted in the solution under a) the critical path is from  $c_{in}$  to  $c_{out}$  of the first 8-bit adder, through the 3 PG cells and from  $c_{in}$  to SUM31 of the last 8-bit adder. The delay to SUM31 is then:  $9 + 3 + 8 = 20$  unit delays. We should also check that the delay to the P and G outputs are shorter. They are both  $9 + 3 + 1 = 14$  unit delays.

c) A similar solution with 4-bit adders would have shorter delays to generate the first carry out and the last SUM output, but twice the number of PG cells. The delay to the last SUM bit would then be:  $5 + 7 + 4 = 16$  unit delays. So in this case it would pay off to use a shorter adder.