### Low Power Wide Fan-in Control Pulse Operated Domino Multiplexor with Static Switching

Vivek Mishra Faculty of Electronics and Comm. Engineering, SRMU, Lucknow, 225003, India

#### ABSTRACT

In wide fan-in domino multiplexors, significant power losses are introduced due to the high switching activity at both dynamic and output nodes. In this paper a multiplexor is proposed with static switching at both dynamic and output nodes. This technique has a control pulse generator circuit which turns on the pull up transistor conditionally for a short duration only. This technique is advanced than previously existing techniques as it has faster response over other existing techniques but lesser power consumption and lesser area required. Simulation is done using 0.18µm CMOS technology. Power consumption of proposed multiplexor is calculated and the results are compared with existing multiplexors for different loading condition, clock frequency and temperature. For capacitance 100 fF, proposed domino multiplexor circuit reduces power consumption by 81.08%, 17.57% and 25.50% as compared to standard footless domino, SP-Domino and SSPD multiplexors.

#### **Keywords**

Multiplexor, Domino logic, Dynamic circuits, Low power, Switching activity.

#### **1. INTRODUCTION**

Static CMOS circuits are very disadvantageous in case of wide fan-in circuits as they contain a large stack of pMOS transistors in pull up network [1]. Domino circuits are widely used in computer applications such as arithmetic logics, microprocessors [2], [3], multiplexors [4], registers [5], [6], comparators [7] and many other circuits. Domino circuits have better speed and require lesser area as it has single nMOS evaluation network and reduce output load capacitance. As domino circuits are operated by a clock pulse, they work in two phases namely precharge phase and evaluation phase [8]. In each precharge phase they put charge on dynamic node. Due to this redundant switching of dynamic node domino circuits suffer from very high dynamic power losses. There is also reduced noise margin due to charge sharing and charge leakage. Keeper transistor is added to reduce charge sharing and charge leakage. Domino circuits also have the design restrictions when combined with static gates, and dual phase pipeline overheads. In Static CMOS logic circuits power is mainly consumed during the toggling of the output state. The greatest cause of power consumption in domino circuits is dynamic power losses due to redundant switching at both dynamic and output nodes. Short circuit power loss and other power losses are lesser when compared to the dynamic power loss. To reduce the redundant switching only at output node true single phase clock domino logic (TSPC) [9], [10], limited switch dynamic logic (LSDL) [11], [12] and pseudo dynamic buffer (PDB) [13] are given in the literature. To reduce the redundant switching at both dynamic and output nodes Single-phase SP-domino logic [14] and static switching pulse domino logic [15] are given in the literature. The Single-phase SP-domino multiplexors are

Vivek Kumar Modanwal Faculty of Electronics and Comm. Engineering, SRMU, Lucknow, 225003, India

inflexible as they use the same transistor as pull up and keeper. The SSPD multiplexors have a large pulse generator circuitry. Although they improve delay and noise specifications, there is a considerable power loss in the pulse generator circuitry and the logic is not efficient when power consumption is compared to SP-domino multiplexors logic. Conditional Pre-charge Dynamic Buffer [16] reduces the redundant switching at both dynamic and output nodes and employs different transistors as pull up and keeper. However, as a tradeoff, nMOS transistor in pull up suffers from greater leakage power loss which may be compromised.

Power dissipation of the domino circuits is given as [15]:

 $P_{\text{Total}} = P_{\text{Dynamic}} + P_{\text{Leakage}} + P_{\text{short circuit}}$ (1)

 $P_{Dynamic}$  is the dynamic power consumed due to charging and discharging of capacitance,  $P_{Leakage}$  is the total leakage power of the circuit. The leakage power increases as the technology is scaled down, and  $P_{Short Circuit}$  is the power dissipated due to direct current flow from power supply to ground.

$$P_{\text{Dynamic}} = \alpha \times C \times V_{DD}^2 \times F_{CLK}$$
(2)

where  $\propto$  is the redundant switching at the output and dynamic node, it depends on the gate topology and inputs, C is the capacitive load at the evaluation node,  $F_{clk}$  is the clock frequency.

$$P_{\text{Leakage}} = I_{\text{Leakage}} \times V_{DD} \tag{3}$$

where  $I_{Leakage}$  is the combination of sub threshold and gate oxide leakage current.

$$P_{\text{Short Circuit}} = \alpha \times I_{SC} \times V_{DD} \times F_{CLK}$$
(4)

 $I_{SC}$  for domino logic gate is the contention current that flows between the evaluation network and pMOS keeper during evaluation mode.

In this paper, a switching-aware design technique is proposed which control redundant switching at both the dynamic and output nodes of a domino multiplexor. The multiplexor has static characteristics and behaves like static CMOS circuits. The gate of pull up PMOS transistor is operated with a control pulse which is generated by a control pulse generator circuit. The pull up transistor conducts for a short duration at the start of rising edge of the clock only when the dynamic node is low during previous clock.

The remainder of the paper is organized as follows. Section 2 describes the previously proposed techniques. In section 3, proposed circuit is described. Simulation results are presented in section 4 and conclusion is presented in section 5.

#### 2. PREVIOUS WORKS

Standard domino logic is a widely used technique as it is faster than static logic techniques and has lesser area if compared to the static logics. Domino circuits are two-phase circuits as it has evaluation phase and precharge phase. The redundant dynamic power dissipation both at dynamic and output nodes in this logic is its main drawback. The foot elimination is preferred due to the significant performance and power penalty of the foot that is driven by the clock and falls into the evaluation critical path [17]. However in footless domino since the domino gates precharge sequentially, the precharge delay becomes critical. Removing redundant switching at the output node was the objective of previously proposed True single phase clock domino logic (TSPC) [9], [10], limited switch dynamic logic (LSDL) [11], [12] and pseudo dynamic buffer (PDB) [13]. Single-phase SP-domino logic [14] and static switching pulse domino logic techniques [15] are proposed to control the redundant switching both at dynamic and output nodes.

#### 2.1 Standard Footless Domino Multiplexor

Standard footless domino logic circuit is shown in fig.1. When the input is kept high or low the operation of circuit is characterized in two operating phase as shown in fig.2.



Fig .1 Standard footless domino multiplexor.



Fig.2 Voltage characteristics at various node of standard footless multiplexor.

#### 2.2Single-Phase SP-Domino Multiplexor

Single-phase SP domino multiplexor is shown in Fig.3 and its voltage characteristics is shown in Fig.4 [14]. The Single-phase SP-domino multiplexor is working on the fundamental of clock delayed domino logic in which latest arriving input does not arrive before the rising edge of the delayed clock [8], [18]. It is a single phase circuit as both pull up and pull down of the dynamic node occurs during the evaluation phase.



#### Fig.3 Wide fan-in multiplexor using single phase SPdomino logic.

Transistor M1 works as both pull up and keeper. Pulse generator produces signal P which turns ON M1 unconditionally at the start of evaluation cycle. If transistors M1, M8 and M9 turn on simultaneously, small contention current flows between them for short duration of pulse P at the gate of M1. If transistor M9 turns OFF at the start of evaluation phase, M1 charged the dynamic node to high voltage. If the value of dynamic node is low at the end of pulse signal P, M7 remains OFF and P is pulled up high by transistor M4. Charging operation of the dynamic node starts after the pulse signal returns to low voltage. The logical expression for pulse P is

$$P = \overline{CLK_i, CLK_I + DYN}$$
(5)

Where CLKi and CLKj are clock delayed signal and its delayed inverse.



Fig.4 Voltage characteristics at various nodes of single phase SP-domino multiplexor.

Design of SP-Domino multiplexor has several flaws. It is inflexible in design as the same transistor M1 function both as pull up and keeper. If the size of M1 increases, keeper ratio increases. Keeper ratio is defined as the ratio of current driving capability of transistor M1 to the transistor M9. High keeper ratio increases the contention current and delay. High keeper ratio has unsymmetrical rise and fall time of the output signal. The delay in SP-Domino multiplexor is also high when compared to the standard domino multiplexor.

#### **2.3 Static Switching Pulse Domino (SSPD)** Multiplexor

SSPD multiplexors also have the static input and output characteristics [15]. SP-Domino multiplexors use single transistor as pull up and keeper but in case of SSPD multiplexors employ separate transistors M1 and M2. Both transistors M1 and M2 never turn ON simultaneously. SP-Domino has lack of flexibility in designing the size of transistor M1 to get symmetrical rise and fall delay of the output. SSPD allows independent tuning of rise and fall delays.

A SSPD multiplexor is shown in Fig.5 and its voltage characteristics shown in Fig. 6. SSPD technique employs a

conditional pulse generator (CPG). The CPG generates pulse P, M1 turns ON only when the dynamic node has been discharge or held low in last evaluation cycle and Keeper M2 turns OFF. If the dynamic node is not discharged, M1 is OFF by CPG. M7 and M8 are ON providing contention current by the keeper. CPG internally generates two additional clock

phasesCCLK<sub>d</sub>and CCLK<sub>i</sub>. Their behavior is related to the clock signal (CLK) and dynamic node. The two clock phasesutilized by the block G1 in CPG to produce pulse P.The logical expression for pulse P is

$$P = \overline{\text{CLK. CCLK}_i + \text{DYN. CCLK}_d}$$
(6)

where  $CCLK_d$  and  $CCLK_i$  are the conditionally generated delayed and delayed inverse phases of the original clock CLK.

Drawback of this technique is that it requires complex conditional pulse generator hence a large area for this CPG. The CPG alone requires 21 transistors. Although it improves the power consumption of the overall circuit, there is a considerable power loss in the CPG. The delay and noise performance of this circuit is better than Single-phase SP domino but it has greater power consumption compared to the same.



International Journal of Computer Applications (0975 – 8887) Volume 172 – No.8, August 2017



Fig.5 Wide fan-in dynamic multiplexor implemented with the SSPD technique. G1 and G2 are the two gates of the pulse generator.



Fig.6 Voltage characteristics at various nodes of static switching pulse domino multiplexor.

#### 3. PROPOSED WORK

To improve the limitations of previous multiplexors a new technique is proposed. Its circuit diagram is as shown in Fig. 7 and Voltage characteristics at various nodes are shown in fig.10. The proposed domino multiplexor is a footless circuit and has static characteristics similar like static gates at both the dynamic and output nodes. This circuit uses two different transistors M6 and M7 for the purpose of pull up and keeper action. It allows independent tuning of rise and fall delays. The gate of pull up transistor M6 is operated with a control pulse P which is always high except for a small duration. The circuit arrangement for the generation of control pulse P is named here as control pulse generator. Control pulse generator is operated with clock pulse CLK, delayed inverse of clock pulse CLKi and a feedback from the output node. If we simply see the pulse generator arrangement, it is a NAND gate with a high enable.





Fig.7 Circuit diagram of proposed wide fan-in multiplexor circuit.

A NAND having CLK and CLKi as input generates a low trigger pulse for a short duration as CLK and CLKi are simultaneously high for a short period at the start of rising edge of each clock due to the clock skew. Transistor M5 is added in the foot of NAND gate and output node is connected as the input to its gate. Thus the low trigger pulse which is generated at the start of rising edge of each clock pulse in NAND gate now generated only when the dynamic node was at low during the previous clock. Thus if the dynamic node is charged to a level high voltage, control pulse is always high until the dynamic node is discharged through the pull down network. If the dynamic node is at the low voltage level, control pulse is generated as a low trigger pulse at the start of each clock pulse otherwise high until the dynamic node is charged to a voltage level high.  $RS_0$ ,  $RS_{1....}RS_{N-1}$  are the row select inputs to the multiplexor's pull down evaluation network and  $D_0$ ,  $D_{1....}D_{N-1}$  are the data inputs to the corresponding row select inputs. Operation of the circuit is explained by considering the data input logic. Duringclock islow, dynamic node holds the previous value regardless of the inputs of the circuit. If the dynamic node was at logic high in the previous clock cycle, no low level control pulse is generated. Thus the dynamic node will discharge to ground if the data input is high or remains at the same logic high if the data input is logic low. If the dynamic node was at logic low in the previous clock cycle, two cases are possible.

Case 1: When data input is high, at the start of rising edge of clock for a short period of time Ts, both CLK and delayed inverse clock CLKi is high.



Fig.8Operation of the proposed circuit for short clock when the  $RS_0$  is logic high and data input  $D_0$  is also logic high.





# Fig.9 Operation of the proposed multiplexor circuit for short period of time Ts at the start of the rising edge of clock when the $RS_0$ is logic high and data input $D_0$ is logic low.

M3 and M4 turned ON provide contention current from the voltage source to the ground as shown in Fig.8. Sizes of M8 and M9 are large enough to discharge the dynamic node to low voltage. After the delay of time Ts, CLK is high and CLKi is low thus M6 turns OFF. Thus no further contention current flows through the pull up network and dynamic node remains at logic low and output node is at logic high.

Case 2: When input is low, at the start of rising edge of the clock, both CLK and delayed inverse clock CLKi is high. M6 turned ON and charged the dynamic node to logic high as shown in Fig.9. After time Ts, for the rest of the clock cycle, control pulse P is high thus M6 turns OFF and M7 turns ON. Dynamic node remains at logic high and output node is at logic low.

Dynamic power consumption of the proposed multiplexor circuit is reduced significantly due to the reduced switching activity. Power consumed in control pulse generator is very less as compared to the other components of the average power. There is some short circuit power consumption as the contention current flows in the circuit, but the overall power consumption is reduced significantly. Both thedelay and power consumption depend upon pull up transistor size. The small size of pull up transistor degraded the performance of the circuit. Thus the size of pull up transistor should be kept adequately large to maintain the performance of the circuit. Higher the size of pull up transistor, higher will be the power consumption. Thus the size of the pull up transistor should be regulated to achieve the optimum response.



Fig.10Voltage characteristics at various nodes of proposed multiplexor circuit.

#### 4. SIMULATION RESULTS

The 8-bit proposed multiplexor and the 8-bit existing multiplexors such as standard footless domino multiplexor, single phase SP-Domino multiplexor and static switching pulse domino multiplexor are simulated using cadence tool in the high performance 180nm technology. The supply voltage in the simulations is 1.8V and clock rate is 200MHz with 50% duty cycle (clock period is 5ns) as default. Input rate is set to 50MHz with 50% duty cycle (input pulse period is 20ns). Rise and fall time of the clock pulse and input pulse is set equal to 10ps. Transistor size is set by  $W_{PMOS} = 27Lmin$ ,  $W_p/W_n=2$  for whole circuit. Keeper ratio k is kept as 0.4.

Table 1 shows the comparison of power consumption of 8-bit proposed multiplexor with 8-bit existing multiplexors for clock frequency 200MHz, where load capacitance is taken as variable. As a result, at higher load capacitance, the proposed multiplexor saves higher power consumption as compared to existing multiplexors. For capacitance 500fF, 8-bit proposed multiplexor reduces power consumption by 74.63%, 8.9% and 14.41% as compared to 8-bit standard footless domino multiplexor, 8-bit SP-Domino multiplexor and 8-bit SSPD multiplexor. The proposed circuit has better power delay productand at higher load capacitance its saving is large as compared to existing circuits.

## Table 1. Effect of variation of load capacitance on power consumption ( $\mu$ W) for clock frequency=200MHz and T=27°C

| Load<br>(farad) | Standard<br>footless<br>domino<br>multiplexr | sp-domino<br>multiplexor | SSPD<br>multiplexor | Proposed<br>multiplexor |
|-----------------|----------------------------------------------|--------------------------|---------------------|-------------------------|
| 100f            | 518.64                                       | 119.03                   | 131.70              | 98.12                   |
| 200f            | 555.80                                       | 135.28                   | 147.67              | 115.46                  |
| 300f            | 593.14                                       | 151.45                   | 163.68              | 132.84                  |
| 400f            | 630.24                                       | 167.67                   | 179.68              | 150.04                  |
| 500f            | 659.97                                       | 183.87                   | 195.65              | 167.45                  |

Table 2 shows the comparison of power consumption of 8-bit proposed multiplexor and 8-bit existing multiplexors, load capacitance is set 100fF for different clock frequencies. It shows power consumption increase as clock frequency increases, maximum power saving is achieved at higher operating frequency. For operating clock frequency 500MHz, the proposed multiplexor reduces power consumption by 64.51%, 13.41% and 21.37% as compared to standard footless multiplexor.

Table 2. Effect of variation of frequency on power consumption ( $\mu$ W) for load capacitance= 100fF and T=27°C

| Frequenc<br>y (MHz) | Standard<br>footless<br>domino<br>multiplex<br>or | sp-<br>domino<br>multiplex<br>or | SSPD<br>multiplex<br>or | Proposed<br>multiplex<br>or |
|---------------------|---------------------------------------------------|----------------------------------|-------------------------|-----------------------------|
| 100                 | 493.16                                            | 76.93                            | 86.47                   | 60.29                       |
| 200                 | 518.64                                            | 119.03                           | 131.70                  | 98.12                       |
| 300                 | 543.69                                            | 160.29                           | 176.92                  | 135.43                      |
| 400                 | 568.76                                            | 201.93                           | 222.70                  | 173.64                      |
| 500                 | 594.05                                            | 243.46                           | 268.08                  | 210.80                      |

Table 3 illustrates the relationship between the power consumption and temperature for 8-bit proposed multiplexor and 8-bit existing multiplexors, clock frequency and load capacitance were set 200MHz and 100fF. It shows proposed multiplexor is independent of temperature variations. At 110°C, proposed 8-bit multiplexor reduces power consumption by 77.41%, 15.61% and 27.68% as compared to 8-bit standard footless multiplexor, 8-bit SP-Domino multiplexor and 8-bit SSPD multiplexor. Similarly, delay increases slightly with temperature. Our proposed circuit suffers little delay penalty as compared to other existing techniques.

| Temperat<br>ure (°C) | Standard<br>footless<br>Domino<br>multiple<br>xor | SP-<br>Domino<br>multiple<br>xor | SSPD<br>multiple<br>xor | Propose<br>d<br>multiple<br>xor |
|----------------------|---------------------------------------------------|----------------------------------|-------------------------|---------------------------------|
| 27                   | 518.64                                            | 119.03                           | 131.70                  | 98.12                           |
| 50                   | 494.82                                            | 120.27                           | 135.98                  | 98.71                           |
| 70                   | 476.48                                            | 121.46                           | 135.30                  | 99.15                           |
| 90                   | 459.95                                            | 117.84                           | 137.05                  | 99.74                           |
| 110                  | 444.49                                            | 118.98                           | 138.85                  | 100.41                          |

#### Table 3. Effect of variation of temperature on power consumption (µW) for clock frequency=200MHz and load capacitance = 100fF

The delay of the circuit is 183.37p second when calculated at 200MHz clock frequency, 50MHz input frequency and 100fF load capacitance. The delay is improved by 14.44% when compared to SP-Domino and nearly at the par with SSPD and standard footless domino.

#### 5. CONCLUSION

The proposed multiplexor removes the redundant switching at both the dynamic and output nodes. It has separate transistors for pull up and keeper action thus flexible in design. The proposed multiplexor provides both inverting and noninverting functions and can be mixed with static logic in single-phase pipeline. It is advantageous over standard footless domino, single-phase SP domino and SSPD in power consumption. It has less complex control pulse generator. The CPG uses the feedback from the dynamic node to generate the control pulse.

The simulation results are compared with existing circuits for different clock frequency, loading condition and temperature. For capacitance 500fF, proposed circuit reduces power consumption by 74.63%, 8.9% and 14.41% as compared to standard footless domino, SP-Domino and SSPD techniques. For operating frequency 500MHz and 8-bit multiplexor, the proposed multiplexor reduces power consumption by 64.51%, 13.41% and 21.37% as compared to standard footless multiplexor.

The proposed technique can be implemented on double gate or triple gate technologies for wider fan-in logic and better power saving.

#### 6. **REFERENCES**

- S. Wairya, R. K. Nagaria and S. Tiwari, 'New design methodologies for high speed mixed-mode CMOS full adder circuits', International Journal of VLSI design & Communication Systems (VLSICS), AIRCC Publication, 2011, 2, (2), pp.78-98.
- [2] S.D. Naffziger, et al., 'The implementation of the Itanium 2 microprocessor', IEEE Journal of Solid-State Circuits, 2002, 37, pp.1448–1460.
- [3] K. J. Nowka and T. Galambos, 'Circuit design techniques for a gigahertz integer microprocessor', IEEE International Conference on Computer Design, 1998, pp.11-16.
- [4] Z. Liu and V. Kursun, 'Leakage biased PMOS sleep switch dynamic circuits', IEEE Transactions on Circuits and Systems, October 2006, 53, (10), pp.1093-1097.

- [5] W.Hwang, R.V.Joshi, W.H.Henkels, 'A500-MHz, 32word×64-bit, eight-port self-resetting CMOS register file', IEEE Journal of Solid-State Circuits, 1999, 34, pp.56–67.
- [6] R.K.Krishnamurthy, A.Alvandpour, G.Balamurugan, N.Shanbhag, K. Soumyanath, S.Y.Borkar, 'A 130-nm 6-GHz 256×32 bit leakage-tolerant register file',IEEE Journal of Solid-State Circuits, 2002, 37, pp. 624–632.
- [7] H. Mahmoodi and K. Roy, 'Diode-footed domino: A leakage-tolerant high fan-in dynamic circuits design style', IEEE Transactions on Circuits and Systems, March 2004, 51, (3), pp.495-503.
- [8] A. Amirabadi, A. A. Kusha, Y. Mortazavi and M. Nourani, 'Clock delayed domino logic with efficient variable threshold voltage keeper', IEEE Transactions on Very Large Scale Integration (VLSI) Systems, February 2007, 15, (2), pp.125-134,.
- [9] F. Tang and A. Bermak , 'Low power TSPC-based domino logic circuit design with 2/3 clock load', Transactionson Energy Procedia, 2012, 14, pp.1168-1174.
- [10] Y. J. Ren., I. Karlsson and Svensson, 'A true singlephase clock dynamic CMOS circuit technique',IEEE Transactions on Solid-State Circuits, 1987, 22, pp.899-901.
- [11] S. Jayakumaran, C. N. Hung, J. N. Kevin, K. Robert and B. Brown, 'Controlled-load limited switch dynamic logic circuit', IEEE Conference on Computer Society, 2005, pp.1-6.
- [12] A. K. Pandey, R. A. Mishra and R. K. Nagaria, 'Low power dynamic buffer circuits', International Journal of VLSI design & Communication Systems (VLSICS), AIRCC Publication, 2012, 3, (5), pp.53-65.
- [13] T. Fang, B. Amine and G. Zhouye, 'Low power dynamic logic circuit design using a pseudo dynamic buffer', Integration, the VLSI journal, 201, 45, pp.395-404.
- [14] J. A. Charbel and A. B. Magdy, 'Single-phase SP domino: A limited-switching dynamic circuit technique for low-power wide fan-in logic gates', IEEE Transactions On Circuits and Systems, 2008, 55, pp.141-145.
- [15] R. Singh, G. Moon, M. Kim, J. Park, W. Y. Shin and S. Kim, 'Static-switching pulse domino: A switching-aware design technique for wide fan-in dynamic multiplexers', Integratiom, The VLSI Journal, June 2012, 45, pp.253-262.
- [16] Amit Kumar Pandey, Vivek Mishra, Ram Awadh Mishra, Rajendra Kumar Nagaria, V. Krishna Rao Kandanvli 'Conditional Precharge Dynamic Buffer Circuit' International Journal of Computer Applications, Vol.60, December 2012, pp.45-52.
- [17] J. Wang, S. Shieh, C. Yeh, and Y. Yeh, 'Pseudo-Footless CMOS Domino Logic Circuits for High-Performance VLSI Designs', IEEE International Symposium on Circuits and Systems, 2004, 2, pp. 401-404.
- [18] G. Wei and C. Sechen, 'Clock-delayed domino for dynamic circuit design', IEEE Transactions on Very Large Integration (VLSI) Systems, August 2000, 8, (4), pp.425-430.