# Performance Evaluation of Bypassing Array Multiplier with Optimized Design

N.Ravi RGM Engg College (Autonomous), JNT University, Anantapur, Nandyal, A. P, India. Dr.T.S.Rao Department of Physics, S.K.Univeristy, Anantapur, A. P, India. Dr.T.J.Prasad Department of ECE, RGM Engg College, Nandyal, A. P, India.

### ABSTRACT

In this paper a new method is proposed to reduce power and area of the array multiplier. In the proposed method vector merging final adder is removed at final stage of the multiplier, at the final stage the generated carry is given to the input of the column of top adder. The adders also do the same what the vector merging final adder can do. The method is applied for array multiplier and column bypassing multiplier (CBM). The results are carried out by H-Spice with different TSMC (Standard and PTM) technology files at a supply voltage 2.0V. Array multiplier has shown 13.91% and Proposed Column Bypassing Multiplier (PCBM) shown 23.38% less power consumption for 180nm CMOS technology than the conventional array multiplier and CBM. 14-T full adder is used to design the multipliers. Due to elimination of the final adder proposed method saves 56 transistors and cause low area.

#### General Terms Embedded, DSP

**Keywords:** Array multiplier, bypassing multiplier, power and EDP.

# 1. INTRODUCTION

Multiplication is an essential arithmetic operation for common DSP (Digital Signal Processors) and Microprocessor applications. In recent years the researchers are emphasizes on three areas i.e power, speed and area. Need of more applications on a single processor will increase the number of transistors on a chip and cause increases to power consumption. Hence among the three fields one of the most important areas to be concentrate is the power. To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus powerefficient multipliers are very important for the design of lowpower DSP systems [1], [2].

In this paper the emphasis is on power of the multiplier. A new method is proposed for array multipliers that can reduce power and area of the multiplier. The array multiplier follows the Carry Save method. The advantage of the array multiplier is its regular structure, which leads to a dense layout, ideal for fabrication.

This paper is organized as follows. Introduction of the paper is discussed in Section 1. The mathematical representation, architecture of the multiplier and explanation is given in Section 2. Power analysis and bypassing techniques with proposed method are discussed in Section 3 & 4. The simulation results of the proposed design and performance comparisons with

conventional multipliers are shown in Section 5. Finally, conclusion is given in Section 6.

#### 2. PARALLEL MULTIPLIER

A serial multiplier consumes less power but due to ripple, delay will be more. In parallel multiplier delay is less but high complex circuitry it consumes more power.

Consider the multiplication of two unsigned n-bit numbers, where  $X = x_{n-1}, x_{n-2}, ..., x_0$  is the multiplicand and  $Y = y_{n-1}, y_{n-2}$ .

 $\dots, y_0$  is the multiplier. The product of these two bits can be written as [3], [4], [5].

-  $\sum_{n=1}^{n-1}$  -  $\sum_{n=1}^{n-1}$  -  $\sum_{i=1}^{n-1}$  -  $\sum_{i=1}^{n-1}$  (i+i)

|    | $P = \sum_{i=0}^{N} X_{i} \sum_{j=0}^{N} Y_{j} 2^{(i+j)} \qquad (1)$ |                                 |                                |                |              |   |  |  |
|----|----------------------------------------------------------------------|---------------------------------|--------------------------------|----------------|--------------|---|--|--|
|    | Whe                                                                  | ere                             |                                |                |              |   |  |  |
|    | <i>X</i> =                                                           | $=\sum_{i=0}^{n-1}X$            | <i>i</i> 2 <sup><i>i</i></sup> | Multiplicand   |              |   |  |  |
|    | Y =                                                                  | $\sum_{j=0}^{n-1} Y_j$          | $2^{i}$                        |                | Multiplier   |   |  |  |
|    | a <sub>3</sub>                                                       | a <sub>2</sub>                  | $a_1$                          | $\mathbf{a}_0$ | Multiplicand |   |  |  |
|    | b <sub>3</sub>                                                       | <b>b</b> <sub>2</sub>           | <b>b</b> <sub>1</sub>          | $b_0$          | Multiplier   |   |  |  |
| PP |                                                                      | b                               | $b_3a_0$<br>$b_3a_1$ $b_2a_1$  |                |              | Ì |  |  |
|    |                                                                      |                                 | $b_2a_2  b_1a_2$               |                |              |   |  |  |
| (  | b3a3                                                                 | b <sub>2</sub> a <sub>3</sub> ł | $b_1a_3$ $b_0a_3$              |                |              | J |  |  |
|    | P7                                                                   | P6 P                            | P5 P4                          | P3 P2          | P1 P0        |   |  |  |

#### Fig 1. Multiplication Architecture

#### 2.1 Array Multiplier

In the Carry Save Addition method, the first row can be designed with either Half-Adders or Full-Adders. We have to multiply two bits (one partial product) each from X and Y. If the first row of the partial products is implemented with full adders, then the third input i.e Cin will be considered '0'. The carries of each full adder can be diagonally forwarded to the next row of the adder. The resulting multiplier is said to be Carry Save Multiplier, because the carry bits are not immediately added, but

(1)

rather are saved for the next stage. The basic idea is to implement the design with full adders only. Hence in the design if the full adders have two input data at any stage, the third input is considered as zero. In the final stage, carries and sums are merged in a carry-propagate (e.g. ripple carry or carry-look ahead) adder stage. This is the conventional array multiplier with CSA similar to "Figure (2)" [6].

# 3. POWER CONSUMPTION IN CMOS CIRCUITS

Power is the most important parameter in digital circuits to fabricate chips and portable devices. CMOS technology is used in digital circuits due to its less power consumption. Power consumption in CMOS circuits can be divided into dynamic and static power consumption shown in eq (2).

$$Ps = \alpha f_{clk} C_L V_{DD}^2 + I_{sc} V_{DD} + I_{leakage} V_{DD}$$
(2)

Where  $\alpha$  is the switching activity,  $f_{clk}$  is the clock frequency,  $C_L$  is the output capacitance,  $V_{DD}$  is the supply voltage,  $I_{SC}$  is the short circuit current, and  $I_{leakage}$  is the leakage current. In micrometer technology dynamic power is the dominant parameter while in the submicron technology, leakage current is the most dominant parameter in total power. The concentration of this paper is on dynamic power reduction.

### 4. BYPASSING TECHNIQUE

Dynamic power consumption can be reduced by bypassing method when the multiplier has more zeros in input data. To perform isolation, transmission gates can be used, as ideal switches with small power consumption, propagation delay similar to the inverter and small area [7]. To study the proposed design we have consider column bypassing multiplier in which columns of adders are bypassed. In this multiplier, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. The advantage of this multiplier is it eliminates the extra correcting circuit [8].

The column bypassing multiplier (CBM) only needs two tri-state gates and one multiplexer in a adder cell. When  $y_j$  is 0 then the corresponding diagonal cells are functioning unnecessarily [9], [10]. In all these cells the partial products  $x_i \times y_j$  and the carry inputs are zero for i = 0, 1, ..., n-1 and this chain does not contribute to the formation of the product. Consequently, the sum output of the above cell can bypass this unimportant diagonal with the use of transmission gates. To achieve all of the above we can replace the Full Adder cell shown in "Figure 2(a)" with the cell in "Figure 2(b)" called the Full Adder Bypassing (FAB) cell [11]. The transmission gates in the FAB cell lock the inputs of the full adder to prevent any transitions when y = 0, and a multiplexer propagates the sum input to the sum output. When y = 1, the sum output of the full adder is passed.



Fig 2. Array structured multiplier





b) The FAB Cell



Fig 3: Proposed Column Bypassing Multiplier (PCBM)

Two tri state buffers with transmission gates are placed at two inputs of full adder with 14-T to disable the operation of the full adder when  $S_{\rm in}$  is 0. The multiplexer with TG-CMOS is placed at the sum output of full adder. The value of sum can be selected from the bypassing value or sum output of full adder according to the value of  $S_{\rm in}$ .

In the proposed method, we implement all the partial product rows of the multiplier same as that of the conventional multiplier except the final addition to add carry bits. The final adder which is used to add carries and sums of the multiplier in the conventional is removed in this method. The carries of the multiplier at the final stage is carefully added to the inputs of next column of the multiplier as shown in the "Figure (3)". Here the carries of the multiplier are not neglected. The carry of the fourth column of the 4x4 multiplier is given to the input of the fifth column instead of zero. The full adder at the top of the fifth column have only two input data, so the third is considered as zero in conventional multiplier. But in the proposed the carry of the fourth column is given to the input of the fifth column first adder. The use of a full adder is to add given inputs. The full adder of Ripple Carry Adder (RCA) can do the same functionality at the final addition stage. That is why the carry of the fourth column is fed to the input of the first adder in the fifth column. In that adder the carry merges with the two inputs.

Then the carry of the fifth column is forwarded to the input of first adder of the sixth column so on. In this multiplier the carry of the seventh column of the adder is not neglected, it is considered as Most Significant Bit (MSB) of the multiplier. Due to elimination of four full adders at the final addition stage power and area can be trade off in the proposed design.

## 5. RESULTS AND DISCUSSIONS

The power, delay and energy delay product comparison of full adders are given in the "Table 1". The results are carried out with the use of Tanner EDA tool and H-Spice. Among the four types of full adders 16-T full adder shows good efficiency in energy delay product. Though it shows good efficiency due to less transistor count 14-T full adder (which also has less power consumption little bit more than 16-T) is used to design the multiplier.

The comparison of CSA and CSA without RCA in power, delay and energy delay product is given in the "Table 2". Due to 56 less transistors CSA without RCA shows less power consumption, delay efficient and also occupy less area. The proposed method also applied to column bypassing multiplier and the results are discussed below.

#### **5.1 Total Power**

The bypassing method should reduce the dynamic power from the total power of the multiplier. The total power of 4x4, 8x8 and 16x16 bypassing multipliers conventional and proposed are given in the "Table 3". 0.18um technology is the standard one, the other are the Predictive

Technology Model (PTM) files downloaded from Berkeley website. For a 4x4 Braun multiplier the proposed multiplier has shown 13.91% less power consumption than that of the conventional for TSMC 0.18um. For the same technology the 4x4 PCBM shows 23.38% less power consumption than CBM. Due to removal of RCA 56 transistors are saved which leads to save power consumption. For high order multipliers the PCBM shows less power consumption. All the data is calculated at a supply voltage 2.0V and the temperature of  $25^{\circ}$ C. In the "Table 3", the letter "C" represents the conventional column bypassing and "P" represents proposed column bypassing multiplier.

## **5.2 Propagation Delay**

The propagation delays of the two multipliers are calculated for all inputs and outs. Longest propagation delay is considered as worst case delay. The proposed multiplier shows more efficient than the convention. For 180nm the proposed shows 34.09% more efficiency than the conventional and PCBM have 92.85% efficiency. The improvement in performance is continued for 8x8 and 16x16 PCBM.

## 5.3 Energy Delay Product

Energy = Power x Delay or Power Delay Product. Energy x Delay is the Energy Delay Product. The proposed array multiplier shows efficiency in energy delay product. It shows for 180nm - 59.91%, 90nm - 9.35% and 65nm - 29.21% improvement in the EDP shown in "Table 4". The PCBM also shows more improvement in energy delay product for all technologies.

In 4x4 proposed multiplier we can save 4 full adders, for 8x8 multiplier 8 full adders and 16x16 multiplier we can save 16 full adders. Then the proposed multipliers occupy less area on chip.

| Adder Type | Power    | delay    | PDP         | EDP(js)     |  |
|------------|----------|----------|-------------|-------------|--|
| TG-CMOS    | 5.07E-05 | 9.36E-10 | 4.74552E-14 | 4.44181E-23 |  |
| TFA        | 3.05E-05 | 2.51E-09 | 7.6555E-14  | 1.92153E-22 |  |
| 14-T       | 2.33E-05 | 8.97E-10 | 2.09001E-14 | 1.87474E-23 |  |
| 16-T       | 1.36E-05 | 5.07E-10 | 6.8952E-15  | 3.49587E-24 |  |

Table 1. Power, Delay and EDP of full adders

| Table 2. | Power, | Delay ar | nd EDP | of Array | and Pro | posed m | ultipliers |
|----------|--------|----------|--------|----------|---------|---------|------------|
|          |        |          |        |          |         |         |            |

| Technology | Array        | Total      | Total      | Prop-Delay | Prop-      | Energy     | EDP        | No.of         |
|------------|--------------|------------|------------|------------|------------|------------|------------|---------------|
|            | Multiplier   | Power      | Power      | (Sec)      | Delay      | Delay      | Percentage | Transistors   |
|            | type         | (Watts)    | Percentage |            | Percentage | Product    | (%)        |               |
|            |              |            | (%)        |            | (%)        | (EDP) JS   |            |               |
| 0.18um     | Conventional | 2.4628E-04 | 13.91      | 1.6490E-09 | 34.09      | 6.6968E-22 | 59.91      | 376           |
|            | Proposed     | 2.1200E-04 | 13.71      | 1.0867E-09 | 54.09      | 2.6841E-22 | 39.91      | (Conventioal) |
| 00nm       | Conventional | 3.8089E-04 | 13.71      | 8.3947E-10 | 1.12       | 2.5002E-22 | 9.35       |               |
| 90nm       | Proposed     | 3.2864E-04 | 15./1      | 8.3000E-10 | 1.12       | 2.2664E-22 | 9.55       | 320           |
| 65nm       | Conventional | 2.0514E-04 | 18.59      | 1.1040E-09 | 0.52       | 2.8451E-22 | 29.21      | (Proposed)    |
| 031111     | Proposed     | 1.6699E-04 | 10.39      | 1.0982E-09 | 0.52       | 2.0139E-22 | 27.21      |               |

International Journal of Computer Applications (0975 – 8887) Volume 28– No.5, August 2011

| Technology | Total Power (Watts) |          |          | Propagation Delay (Sec) |          |          | Energy Delay Product (JS) |          |          |
|------------|---------------------|----------|----------|-------------------------|----------|----------|---------------------------|----------|----------|
|            | 4x4                 | 8x8      | 16x16    | 4x4                     | 8x8      | 16x16    | 4x4                       | 8x8      | 16x16    |
| 0.18um C   | 3.06E-4             | 1.13E-03 | 4.30E-03 | 2.16E-09                | 2.36E-09 | 4.98E-09 | 1.42E-21                  | 6.28E-21 | 1.06E-19 |
| 0.18um(P)  | 2.48E-4             | 9.10E-04 | 5.93E-04 | 1.12E-09                | 1.67E-09 | 1.78E-09 | 3.10E-22                  | 2.53E-21 | 1.87E-21 |
| 90nm C     | 6.99E-4             | 2.48E-03 | 8.69E-03 | 2.70E-09                | 3.28E-09 | 4.42E-09 | 5.09E-21                  | 2.67E-20 | 1.69E-19 |
| 90nm(P)    | 6.13E-4             | 2.25E-03 | 1.36E-03 | 8.68E-10                | 1.19E-09 | 1.42E-09 | 4.61E-22                  | 3.18E-21 | 2.74E-21 |
| 65nm C     | 2.57E-4             | 1.65E-03 | 4.61E-03 | 3.36E-09                | 3.36E-09 | 4.26E-09 | 2.90E-21                  | 1.86E-20 | 8.36E-20 |
| 65nm (P)   | 2.22E-4             | 7.66E-04 | 4.67E-04 | 2.41E-09                | 2.63E-09 | 3.28E-09 | 1.29E-21                  | 5.29E-21 | 5.02E-21 |

#### Table 3. Power, delay and EDP of Bypass and PCBM

#### 6. CONCLUSION

A new design for array and bypassing multiplier is presented. The proposed method consumes less power and occupies less area on chip due to elimination of final adder used for vector merging of final carries of the multiplier. Total Power, propagation delay and energy delay product of 4x4, 8x8 and 16x16 multipliers are calculated for bypassing multiplier. The bypassing method saves power whenever there is a zero in the input of the multiplier. That is why the proposed method applied to column bypassing multiplier further more to reduce the dynamic power of the multiplier. We achieved power saving and low area for the array and bypassing multipliers by proposed method.

#### 7. REFERENCES

- Anantha P. Chandrakasan, Samuel Sheng, and Robert W. Brodersen, "Low Power CMOS Digital Design", IEEE Journal of Solid State Circuits, Vol 27, No.4, April 1992.
- [2] Anantha. P. Chandrakashanan, R. Brodersen, "Low Power Digital CMOS Design", Kluwer. Academic Publisher, 1996.
- [3] Jan M. Rabaey, Anantha Chandrakasan and Borivoje Nikolic, Digital Integrated Circuits- A design Perspective, Second edition, PHI-2004.
- [4] Neil H.E.Weste, David Harris and Ayan Banerjee, "CMOS VLSI Design-A Circuits and System Perspective", Pearson Education, Third edition, 2009.
- [5] V. H. Hamacher, Z. G. Vranesic and S. G. Zaky, *Computer Organization*, McGraw-Hill, 1990.

- [6] Zhijun Huang and Milos D. Ercegovac, "Two-Dimensional Signal Gating for Low-Power Array Multiplier Design", IEEE Conference Proceedings, 2002.
- [7] M.-C. Wen, S.-J. Wang and Y.-N. Lin, Low-power parallel multiplier with column bypassing, ELECTRONICS LETTERS, 12th May 2005 Vol. 41 No. 10.
- [8] M. Mottaghi-Dastjerdi, A. Afzali-Kusha, and M. Pedram, "BZ-FAD: A Low-Power Low-Area Multiplier based on Shift-and-Add Architecture", *IEEE Trans. on VLSI* Systems, 2008
- [9] Ko-Chi Kuo, Chi- Wen Chou, Low Power and High Speed multiplier design with row bypassing and parallel architecture, Microelectronics Journal(Science Direct), Vol- 41, 2010, pp.639-650.
- [10] Dimitris Bekiaris, George Economakos and Kiamal Pekmestzi, A Mixed Style Multiplier Architecture for Low Dynamic and Leakage Power Dissipation, IEEE Conference, 2010, pp.258-261.
- [11] Jin-Fa lin, Ming-Hwa Sheu, Yin-Tsung Hwang, "Low-Power and Low-Complextly Full Adder Design for Wireless Base Band Application", IEEE Conference Proceedings, June 2006, pp. 2337-2341.
- [12] A. Fayed and M. Bayoumi, "A low-power 10transistor full adder cell for embedded architectures," in *Proc. IEEE Symp. Circuits Syst.*, Sydney, Australia, May 2001, pp. 226–229.
- [13] Jin-Fa Lin, Yin-Tsung Hwang, Member, IEEE, Ming-Hwa Sheu, Member, IEEE, and Cheng-Che Ho, A Novel High-Speed and Energy Efficient 10-Transistor Full Adder Design, IEEE Tran on Circuits and Systems—I: Regular Papers, Vol. 54, no. 5, May 2007, pp. 1050-1059.