# Area-Delay Efficient Flipping 2-d DWT Structure using PEB Booth Multiplier

Vikas Tiwari Department of ECE,Jaypee University of Engineering and Technology, India

# ABSTRACT

In this paper, an area-delay efficient structure for twodimensional discrete wavelet transform (2-D DWT) is proposed. The proposed structure has a small cycle period, and offer high throughput compared to the existing structures due to its efficient arithmetic unit (AU). The flipping scheme and efficient probability estimated biased (PEB) Booth multiplier provide efficient area-delay product (ADP) and energy per image (EPI) DWT computation for output of the filter. Compared with existing flipping-based structure, the proposed AU based flipping structures, involve 4.5 times as little as ADP for block-sizes 16. The flipping scheme offer ADP efficient large block size structure due to efficient arithmetic computation unit.

## **Keywords**

Discrete wavelet transforms, VLSI architecture, Flipping scheme, Digital filter.

# **1. INTRODUCTION**

Technological growth in semiconductor fabrication industry have demand for faster, area efficient and low power VLSI circuits for complex image processing applications [1] – [4]. DWT is one of the most popular transform that is used for image transformation. In this paper, a high speed, low power DWT architecture is designed and implemented on ASIC using 90nm Technology. 1-D DWT architecture based on flipping scheme architecture uses multipliers, adders and registers to consuming power. This paper addresses area – delay product (ADP) reduction in DWT filter by using a modified algorithm for PEB multiplier. The proposed architecture systematically combines hardware optimization techniques to develop a flexible DWT architecture that has high performance and is suitable for portable, high speed, low power applications.

Multiplier is the basic arithmetic component to compute the multiplication operations in the lifting and flipping based 2-D DWT filtering structures. Large number of multiplications are using for computation of DWT filter output. Multiplication process needs to large amount of area to compute of the DWT output in VLSI structure on chip [5] -[7]. Therefore require a efficient area based multiplier unit keep this in mind, derived a fixed-width radix-4 Booth multiplier for data computation of 2-D DWT structure. Therefore, DWT need to be implemented with efficient multiplier in very large scale integration (VLSI) systems for real-time applications. During last two decades, several computation schemes, algorithm mapping, and architectural design methods have been suggested to derive an area-delay efficient VLSI architecture for DWT [8]. Mostly, DWT structure involves arithmetic component as multiplier and memory resources [9]. We find that DWT arithmetic computation to find most efficient multiplier to derive area efficient 2-D DWT structure. we suggest the our PEB multiplier is most efficient to compute area-delay efficient

B. K. Mohanty Department of ECE,Jaypee University of Engineering and Technology, India

high-throughput 1-D DWT structures. The rest of the paper is organized as: flipping 1-D DWT computation in Section 2. Section 3 discusses PEB based Booth multiplier and proposed arithmetic unit using PEB multiplier and the flipping scheme based DWT structure for 2-input 2-output structure in section 4. ASIC synthesis result given in Section 5 and acknowledgement in section 6.

## 2. FLIPPING 1-D DWT COMPUTATION

Conventional lifting architectures require fewer arithmetic operations but suffer the longer critical path than convolutionbased architectures. In this

section, describe a VLSI architecture called 1-D DWT flipping structure to reduced critical path.

According to flipping scheme [11], the low-pass and highpass coefficients of DWT can be computed using the equations 1(a-f).

$$r_1(n) = \alpha^{-1}x(2n-1) + x(2n) + x(2n-2)$$
 (1a)

$$r_2(n) = (\alpha\beta)^{-1}x(2n-2) + r_1(n) + r_1(n-1)$$
 (1b)

$$r_3(n) = (\beta \gamma)^{-1} r_1(n-1) + r_2(n) + r_2(n-1)$$
 (1c)

$$r_4(n) = (\gamma \delta)^{-1} r_2(n-1) + r_3(n) - r_3(n-1)$$
  
(1d)

$$v_h(n) = \alpha \beta \gamma / K r_3(n)$$
 (1e)

$$v_l(n) = \alpha \beta \gamma \delta K r_4(n) \tag{1f}$$

For simplicity and comparison, only the case that

throughput is 2-input/2-output per clock cycle is considered as shown in Fig 1 (a).



#### Fig 1 (a) The structure of flipping 1-D DWT computation (b) Arithmetic unit block

The structure of flipping computation scheme of equation number (4) is shown in Fig 1. It consists of 6 multiplication, 8 addition and 4 delay operations to compute a pair of low-pass and high-pass sub bands from a pair of inputs [x(2n), x(2n-1)].

# 3. PEB BASED BOOTH MULTIPLIER

In signal processing applications such as in the realization of a digital filter and wavelet filter the multiplication process is mainly of the type of multiplying a signal data by a coefficient data. The signal data is a variable, every multiplication involves a different value, but the coefficient data is a constant, every multiplication involves the same value. The Booth multiplier is widely used in ASIC oriented products due to the higher computing speed and smaller area [12]. This encoding technique reduced the partial products are needed during the computation. Moreover in fixed-width Booth multiplier area saving further achieved by truncating the n least significant columns and preserving the n most significant columns of the partial product.

The standard product (SP) of the multiplier expressed as

$$SP = MP + TP \tag{2}$$

Where MP constitutes n most significant columns (MSCs) and TP constitutes n least significant columns (LSCs) of the 2n column PPA.

The quantized product (QP) of fixed-width multiplier is given by

$$QP = MP + 2^n . \sigma \tag{3}$$

where  $(\sigma)$  represents the estimated bias given by

$$\sigma = \text{Round}(2^{-1} \times \text{TP}_{\text{major}} + \text{TP}_{\text{minor}})(4)$$

 $TP_{major}\ and\ TP_{minor}$  , respectively, represents n-th column and (n-1) LSCs of TP.

the modified PEB formula [12] is given as:

$$E [TP_{minor}] = \left(\frac{3n}{32}\right) = A' + Round \left(\frac{B'}{2}\right)$$
(8)  
$$\sigma = A' + Round [(TP_{major} + B') \times 2^{-1}]$$
(5)

Where A' and B' are, respectively, the integer and fractional part of [(3L/32) + 0.5], where B' is rounded to 0 or 1 (B' = 1 when fractional-part greater than equal to 0.5 otherwise B' = 0). The bias circuit based on the modified PEB formula generates one less carry-bit than the existing PEB bias circuit and sends the same compensation value to the MP of PPA. The modified PEB formula, therefore, helps to reduce the logic complexity of the bias circuit.

# 4. PROPOSED ARITHMETIC COMPUTATION UNIT FOR DWT COMPUTATION USING PEB BOOTH MULTIPLIER

The flipping scheme based DWT computations require less critical path delay than lifting scheme based DWT computation. Therefore we have considered flipping scheme for DWT computation. The flipping scheme based DWT structure involves several multiplication and addition. Thus an area-delay efficient multiplier design can improve the performance of DWT computation.

The proposed arithmetic unit (AU) to construct the flipping cell of 1-D and 2-D DWT structure. Structure of the AU of Fig.1 (a) and (b) is further represented by the block diagram of Fig. 2 which is comprised one efficient multiplier unit and two ripple carry adders (RCAs). The efficient 12-bit multiplier unit design based on radix-4 modified Booth encoded (MBE) scheme and probabilistic biasing technique.

# 5. SYNTHESIS RESULTS

To valiate the proposed design, coded in hardwre description language (HDL) for fixed-point implementation.Synthesized the 2-D flipping DWT structure for two input two output configuration. Synthesis results obtained from the Synopsys design compiler tool listed in Table 1. As shown in Table 1 the proposed arithmetic unit using efficient PEB multiplier based flipping - DWT structure and arithmetic unit using existing multiplier design based flipping DWT structure. We have estimated area - delay product (ADP) of both AU based flipping DWT two input two output structure at 0.7 V supply voltage. We have coded proposed structures and the structure of [9] in VHDL for block sizes 2 and 16. We have assumed 1-level decomposition of the input image of size  $(512 \times 512)$ and synthesized all the designs without frame-buffer as the frame-buffer usually external to the chip due to its large size compared to the core.

 
 Table 1. Synthesis results of flipping based existing and proposed DWT strictures

| Design              | Block<br>size<br>(N) | MCP<br>(ns) | ADP<br>(µm <sup>2</sup> s) | Powe<br>r<br>( <i>mw</i> ) | EPI<br>(uJ) |
|---------------------|----------------------|-------------|----------------------------|----------------------------|-------------|
| Structure<br>of [9] | 2                    | 18.02       | 1653.21                    | 16.12                      | 38.07       |
| Proposed            | 16                   | 18.5        | 363.17                     | 80.5                       | 24.39       |

In this synthesis, considered 8-bit pixel values and 12-bit word length precision for all intermediate and output signals. All the designs are synthesized in Synopsys Design Compiler using SAED90nm CMOS library [10]. We have estimated area-delay product (ADP) (ADP= Area  $\times$  MCP  $\times$  Image-size / block-size) and energy per image (EPI) (EPI = Power  $\times$  MCP  $\times$  Image-size / block-size) of all the designs listed in Table 1. Compared with the structure of [9], the proposed flipping-based involve 78% less ADP, and 36% less EPI for block sizes 2 and 16.



Fig 2 The Arithmetic unit of flipping DWT structure

The AU structure compute the low pas and high pass co efficient of equtaions 1(a-f)

## 6. ACKNOWLEDGMENTS

Our sincere thanks to research lab of VLSI in JUET, Guna, India which have given resources for this research work.

## 7. REFERENCES

- Taubman D. and A. Zakhor, "Multirate 3-d subband coding of video," IEEE Trans. Image Processing, vol. 3, pp. 572–588.Sept (1994).
- [2] Kronland-Martinet R., Morlet J., Grossmann A. "Analysis of sound patterns through wavelet transforms"

Int. Journal of Pattern Recognition and Artificial Intelligence, vol. 1, pp.273-302, (1987).

- [3] Stoksik M. A., R.G. Lane and D.T. Nguyen "Accurate synthesis of fractional Brownian motion using wavelets" Electronics Letters, IET Volume:30, Issue: 5 (1994).
- [4] Senhadji, L., Carrault, G. and Bellanguer, J. J., "Interictal EEG spike detection: A new framework based on the wavelet transforms," in Proc. IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, pp. 548–551 (1994).
- [5] Dai, Q., Chen, X. and Lin, C., "A novel VLSI architecture for multidimensional discrete wavelet transform," IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 8, pp. 1105–1110, (2004).
- [6] Vishwanath M., "The recursive pyramid algorithm for the discrete wavelet transform," IEEE Trans. Signal Processing, vol. 42, no. 3, pp.673-677, Mar. (1994).
- [7] Wu P. C. and L.-G. Chen, "An efficient architecture for 2-D discrete wavelet transform," IEEE Trans. Circuits

Syst. Video Technol., vol. 11,no. 4, pp. 536–545, (Apr. 2001).

- [8] Cheng C-C., C.-T. Huang, C.-Y. Cheng, C.-Jr. Lian, and L.-G. Chen, "On-chip memory optimization scheme for VLSI implementation of line-based 2-D discrete wavelet transform," IEEE Trans. Circuit Syst.Video Technol., vol. 17, no. 7, pp. 814–822, Jul. (2007).
- [9] Zhang, W., Jiang, Z., Gao, Z. and Liu, Y., "An efficient VLSI architectures for lifting-based discrete wavelet transform," IEEE Transactions on Circuits and Systems– II, Express Briefs, vol. 59, no. 3, pp. 158-162, (2012).
- [10] SAED (Synopsys Armenia Educational Department) Library 90nm, www.synopsys.com.
- [11] Huang C-T., P.-C. Tseng, and L.-G. Chen, "Analysis and VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform," IEEE Transactions on Signal Processing, vol. 53, no. 4, pp1575-1586, (April 2005).
- [12] Mohanty B. K. and Tiwari V. "Modified PEB formulation for hardware efficient fixed-width Booth multiplier" Springer, CSSP vol 33 issue 12,(Dec. 2014)