# Review of Efficient Discrete Wavelet Filter based CSD Technique

Naveen Raikwar Sagar Institute of Research and Technology, Bhopal

## ABSTRACT

A two dimensional discrete wavelet transform hardware design based on canonic signed digit (CSD) architecture is presented in this paper. We have proposed canonic signed digit (CSD) arithmetic based design for low complexity and efficient implementation of discrete wavelet packet transform. Canonic signed digit (CSD) technique has been applied to reduce the number of full adders required by 2's complement based designs architecture. This architecture is suitable for application in high speed online applications. With this use of this architecture design the speed of the wavelet packet transforms will be increased with a factor two but the occupied area of the circuit will be less than double. The hardware utilization efficiency of the circuit will be around 100%.

## **Keywords**

Discrete wavelet packet transform (DWPT), One, Two, Three Level, Canonic signed digit (CSD) scheme.

## **1. INTRODUCTION**

Wavelets, based on the time-scaling representations provide an alternative to the frequency-time representation in the domain of signal processing. The translation or(shifting) and the dilation (orscaling) for wavelets are unique. Wavelet is a kind of bases generated by dilation and translation of a function [1], [2]. The wavelet analysis method has a good ability at localizing signal in both time and frequency plane[4].DWPT has also been widely used in many applications due to the characteristic of flexible TF decomposition, especially for image and video coding, audio coding andspeech enhancement, speech recognition, hearing aid and digital commutation [2],[3],[4].

In this paper, shift and add operation is used for the bit-level multiplication of two number. It has been observed that the complexity of a shift-add type signed multiplier is dependent on the total number of ones of the 2's complement representation of the multiplicand number with the shifted partial sum whereas the zero's will onlybe shifting the partial sum. Assuming that the shifting does not required any hardware because it can be done by hardwiring. The number of ones of the 2's complement number, therefore, will be responsible for determining the numbers of full adder (FA) required for implementing the multiplier. Canonic signed digit (CSD) arevery popular for representing a number with fewest number of non-zero digit. There are never two consecutive bits in a CSD number are non-zero. In the CSD representation of a number contains the minimum possible number of nonzero bits, thus given the name canonic. The representation of a CSD number is unique. CSD numbers cover the range (-4/3, 4/3), out of which the value in the range (-1, 1) are of greatest interest. The CSD number has 33% less the average non-zero bits than 2's complement number.

Navneet Kaur Sagar Institute of Research and Technology, Bhopal

Comparing the 2-Dimensional DWT with the 1-Dimensional DWT, we find that the difference is that in the 1-Dimensional DWT the range of operation is halved with a change in decomposition level, while in the 2-Dimensional DWT the range of operation is always the whole frame. So as the operation range halved with the increase in decomposition level, the above structure can perform the 1-Dimensional DWT easily.

In this paper, we have introduced a new and improved architecture for the discrete wavelet transform using multiplier based (MB) and canonic signed digit based (CSDB) architecture. The algorithm used for the tree structure of wavelet packet transform is analyzed in the section 2. The low complexity design for DWPT in the section 3. CSD based architecture for DWPT in the section 4. Proposed architecture for CSDB architecture for DWPT in the section 5. Simulation result and conclusion are given in the section 6 and 7.

## 2. DIRECT WAVELET PACKET TRANSFORM

The model used in [5] for implementing the tree structure of 2 DimentionalDirect Wavelet Packet Transform (DWPT) is based on the filtering process. Figure 1 is used for the depiction of a complete 3-level Direct WPT. In this figure G and H is the high pass and low pass filter respectively.

The number of the input cycles for one time is computation period which is responsible toproduces output samples. Generally, the computation period is M= for a j-level DWPT. The period of the 3-level computation is 8. Figure 1, The Sub band Coding Algorithm As an example, suppose that the original signal X[n] has N- sample points, spanning frequency band of zero to  $\pi$  rad/s. In the first decomposition level, the signal passed through high pass and low pass filters, followed by subsampling by 2. The output of the high pass filter has N/2- sample points (which is half the time resolution ) but it only spans the frequencies  $\Pi/2$  to  $\Pi$  rad/s (double the frequency resolution of the sample points).

The output of the low-pass filer also has N/2- sample points, but it spans the other half of the frequency band, frequencies from 0 to  $\Pi/2$  rad/s. Again for further decomposition low and high-pass filter output passed through the same low pass and high pass filters. The output of the second low pass filter followed by sub sampling has N/4 samples spanning a frequency band of 0 to  $\Pi/4$  rad/s, and the output of the second high pass filter followed by sub sampling has N/4 samples spanning a frequency band of  $\Pi/4$  rad/s, and the output of the second high pass filter followed by sub sampling has N/4 samples spanning a frequency band of  $\Pi/4$  to  $\Pi/2$  rad/s. The second high pass filtered signal constitutes the second level of DWPT coefficients. This signal has half the time resolution, but twice the frequency resolution of the first level signal. This process continues until two samples are left. For this specific example

there would be 3 levels of decomposition, each having half the number of samples of the previous level.

The DWPT of the original signal is then obtained by concatenating all of the coefficients starting from the last level of decomposition (two samples remaining, in this case). The DWT willbe then having the same number of coefficient as there are in original signal.



#### Figure 1:3-Levels for DWPT. Where G, Hare the highpass and low-pass filter coefficient.

#### 3. LITERATURE REVIEW

**Wu et al.** [10],have proposed a High-Speed & Memory Efficient 2-Dimensional DWT using scalable Poly phase Structure with DA for JPEG2000 Standard in 2011on Xilinx Spartan3A DSP. In this paper,using poly phase filter bank architecture with Distributed Arithmetic (DA) to speedup wavelet computation. An efficient XilinxSpartan3A DSP implementation of 2D DWT (Discrete Wavelet Transforms) outputs shows that the distributed arithmetic formulation results inreducing the consumption of logic resources significantly and a considerable performance gain.

**Trenas [11],** has proposed outline has diminished territory and force as contrasted and the standard SQRT CSLA with slight increase in the. The work by trenas assesses the execution of the proposed outlines regarding deferral, region, power, and their items by hand with legitimate exertion and format in 0.18-m CMOS process innovation through specially craft. The outcomes of the investigation demonstrates that here the proposed CSLA structure is superior to the normal SQRT CSLA.

**Trenas et al. [12]**,have proposed a High-Speed & Memory Efficient 2-D DWT on Xilinx Spartan3A DSP using scalable Poly phase Structure with DA for JPEG2000 Standard in 2011. In this paper, they describe an efficient XilinxSpartan3A DSP implementation of 2D DWT (Discrete Wavelet Transforms) using poly phase filter bank architecture with Distributed Arithmetic (DA) to speedup wavelet computation. Results show that the distributed arithmetic formulation results in a considerable performance gain while reducing the consumption of logic resources significantly. This architecture supports any size of Image and any level of decomposition. With minor changes this core can be implemented on any FPGA device.

Mohsen et al. [13], have proposed Memory-Efficient High Speed Convolution based on Generic Structure for Multilevel 2-Dimensional DWT in 2011, by the use of a memory-centric design strategy, and that we have derived a convolution based generic architecture for the computation of 3-level 2-D DWT onformat in 0.18-m CMOS process based innovationDaubechies as well as wavelet filters. The proposed structure does not have involvement of frame-buffer and also involves line-buffers of size 3(K - 2) M/4 which is independent from the throughput-rate. Compared to the best from all existing lifting based on folded structure [12], proposed structure uses 9/7-filter for the image-size (512×512) with a very less area complexity and 2.62 times less computation time. Compared than the recently proposed parallel structure [11], the proposed architecture involves 2.6 lesser time ADP and consumes 1.48 times less EPI. The proposed architecture thus therefore, can be used as one for area delay efficient and energy efficient implementation of multi-level 2-Dimensional Discrete Wavelet Transform by using Daubechiesas well as bi-orthogonal filter based design for high-performance image processing applications.

Table 1: Comparison between previous convolution based architecture for an N×N image with filter length of andproposed Discrete Wavelet Packet Transform.

| Author             | Multiplie<br>r | Adder<br>/Sub | Memory<br>Size | DAT                 |
|--------------------|----------------|---------------|----------------|---------------------|
| Wu[10]             | 8              | 6             | 2N             | $T_m + 3T_a$        |
| Trena's            | 8              | 6             | Ν              | $(T_m + 3T_a)$      |
| [11]               |                |               |                |                     |
| Trena's            | 8              | 6             | 4(N-1)         | $(T_m + 3T_a)$      |
| [12]               |                |               |                |                     |
|                    |                |               |                |                     |
| Mohsen's           | 6              | 6             | -              | $4(T_{m} + 3T_{a})$ |
| [13]               |                |               |                |                     |
| Proposed           | 24             | 1             | -              | $T_m + 3T_a$        |
| (M B)              |                | 8             |                |                     |
| Proposed<br>(CSDB) | -              | 5<br>1        | _              | $T_m + 3T_a$        |

## 4. LOW-COMPLEXITY DESIGNS FOR DWPT

DWPT computation is nothing but two-channel FIR filter computation. Low-pass and high-pass down sampled filter computations are performed on the input to calculate the DWPT coefficients. Low-pass down sampled filter is the average between two samples and high-pass filter is the difference b/w two samples. The DWPT algorithms for 1-level decomposition are given as

# $\mathbf{Y}_{n}$ high[k]= $\sum h[n]*x[2k-n]$

# $Y_n low[k] = \sum g[n] * x[2k-n]$

Where x(n) is the input and Y high[k] & Y low [k] are respectively the low-pass and high-pass DWPT coefficients, h[n] and g[n] are respectively, the low-pass and high-pass filter coefficients. We have assumed the Daubechies four tap (Daub-4) filter coefficients for the low-pass filter proposed design. However, similar type of design can be derived for other type of wavelet filters as well. The Daub-4 low-pass filter coefficients are taken from [7]. The corresponding highpass filter coefficients are calculated using the following relation:

$$g(n) = (-1)^{k}h(N-n)$$

Table 1: Low and high-pass Daub-4 filter coefficients. The binary digit 1 shown in bold face represents -1 of the CSD digit

| h(0) | 0.4829629131 | 0.01111011 | 0.10000101          |
|------|--------------|------------|---------------------|
| h(1) | 0.8365163037 | 0.11010110 | 0.00101010          |
| h(2) | 0.2241438680 | 0.00111001 | 0.0100 <b>1</b> 001 |
| h(3) | -0.129409522 | 1.11011111 | 1.00100001          |
| g(0) | -0.129409522 | 1.11011111 | 1.00100001          |
| g(1) | -0.224143868 | 1.11000111 | 1.01001001          |
| g(2) | 0.836516303  | 0.11010110 | 0.00101010          |
| g(3) | -0.482962913 | 1.10000101 | 1.10000101          |

Where, h(n) and g(n) are, respectively, the low and high-pass filter coefficients. N is the filter order. The 8 bit 2'complement and CSD representation of the low and high-pass filter coefficient are given in table1. Equation can be rewritten fourtap FIR filter as:

 $Y_h[k] = [h(0)+h(1)Z^{-1}+h(2)Z^{-2}h(3)Z^{-3}]X(n)$ 

 $Y_h[k] = [g(0)+h(1)Z^{-1}+g(2)Z^{-2}g(3)Z^{-3}]X(n)$ 

Where  $Z^{-1}$  operator represents one sample delay in Z-domain.

## 5. CSD BASED DESIGNED FOR DWPT

Each of the multiplier unit is replaced with shifters and adders/subtraction for CSD implementation of DWPT. The constant multiplication factors of [5] are replaced with shift and adder/subtraction operation and rewritten as

Low pass filter

$$\begin{split} Y_h[k] &= [\ x(n) >> 1 - x(n) >> 6 - x(n) >> 8] + [-x(n-1) >> 3 - x(n-1) >> 5 - x(n-1) >> 7] + [x(n-2) >> 2 - x(n-2) >> 5 \\ &+ x(n) >> 8] + [-x(n-3) >> 3 - x(n-3) >> 7] \ (6) \end{split}$$

#### High pass filter

$$\begin{split} Yg \ [k] &= [-x(n) >> 3 - x(n) >> 7] + [-x(n-1) >> 2 + x(n-1) >> 5 \\ &- x(n-1) >> 7] + [-x(n-2) >> 3 - x(n-2) >> 5 - x(n-2) >> 7] + [-x(n-3) >> 1 + x(n-3) >> 6] \ (7) \end{split}$$

# 6. PROPOSED ARCHITECTURE

In this paper, the original signal X[n] has N- sample points, is passed through 1 \*2 demultipler. When select line is 0 then we get even sample and when select line is 1 then we get odd sample. After that we have passed these samples through CSD based low-pass filter, same process with high-pass filter.



Удад Унда Удна Унна Уднн Ундн Удан Уннн

#### Figure 2: 3-Level CSD design based DWPT, CG and CH means the CSD design based low & high-pass filter.

Now we get N/2 sample s at the first decomposition level output of  $\tilde{CSD}$  based high-pass (  $Y_H$ ) and low-pass filter ( $Y_L$  ). At the second decomposition level, the output of CSD based low-pass and high-pass filter passed through a register unit. Now the output of register unit passed through mux. When the select line 0, we get CSD based low-pass filter output and when the select line 1, we get CSD based high-pass filter. Now we have passed mux output through CSD based lowpass filter then we get & output now same process applied with the CSD based high-pass filter we get & . At the third decomposition level, the time period is doubled and frequency will be half, and the output of CSD based low-pass and highpass filter is passed through a register unit. Now the output of register unit is passed through mux. When the select line is 00, we get CSD based low-pass filter output, the select line is 01, we get, the select line is 10 we get and the select line is 11 we get . Now finally we have passed mux output through CSD based low pass filter and high pass filter we get Y<sub>LLL</sub>,Y<sub>LHL</sub>, Y<sub>HLL</sub>, Y<sub>HHL</sub> and Y<sub>LLH</sub>, Y<sub>LHH</sub>, Y<sub>HLH</sub>, Y<sub>HHH</sub>.

# 7. CONCLUSION

In this paper the proposed multiplier based (MB) architecture and canonic signed digit based (CSDB) architecture for Discrete Wavelet packet transforms (DWPT). We have used Canonic Signed Digit number system to represent the filter coefficients of the wavelet filter with minimum number of ones. Consequently, the number of Full Adders in the design will be reduced by nearly 50% of these of the 2's complement design. Then we will further apply the Canonic Signed Digit based technique to further reduce the power and area. In this architecture the speed of the input sampling will be increased by using low and high pass filters respectively. Here, the low pass filter is the average between two sample numbers and high pass filter is the difference between two sample numbers. There is not any on-chip memory and memory access for the computation, so that we can achieve significant reduction in die area and power dissipation of the circuit design.

## 8. REFERENCES

- Linning Ye and ZujunHou, "Memory Efficient Multilevel Discrete Wavelet Transform Schemes for JPEG2000", IEEE Transactions on Circuits and Systems for Video Technology, Volume 25, number 11, November 2015.
- [2] R. PraislineJasmi and Mr. B. Perumal, "Comparison of Image Compression Techniques using Huffman Coding,DWT and Fractal Algorithm", 2015 International Conference on Computer Communication and Informatics (ICCCI -2015), January 08 – 10, 2015, Coimbatore, INDIA.
- [3] S. Udhaya and Dr. P. Rangarajan, "An Efficient Multiplier Design for Discrete Wavelet Transform (DWT) in Image Fusion", MEJSR 2015.
- [4] RashmitaSahoo, Sangita Roy, SheliSinhaChaudhuri, "Haar Wavelet Transform Image Compression using Run Length Encoding", International Conference on Communication and Signal Processing, April 3<sup>RD</sup>-5, 2014, India.
- [5] M. Sravanthiand T. Prasad," Memory Efficient High Speed Lifting Based VLSI Architecture for Multi-Level 2D-DWT", IRF 2014.
- [6] S Manjui and Mr VSornagopae, "An Efficient SQRT Architecture of Carry Select Adder Design by Common

Boolean Logic" 978-1-4673-5301-4/13/\$31.00 ©2013 IEEE.

- [7] B. Ramkumar and Harish M Kittur, "Low-Power and Area-Efficient Carry Select Adder", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, VOLUME 20, NO. 2, February 2012.
- [8] Srikanth. S and M. Jagadeeswari, "High Speed VLSI Architecture for Multilevel Lifting 2-Dimensional DWT Using MIMO", IJSCE 2012.
- [9] GauravTewari, SantuSardar, K. A. Babu, "High-Speed & Memory Efficient 2-D DWT on Xilinx Spartan3A DSP using scalable Polyphase Structure with DA for JPEG2000 Standard", IEEE 2011.
- [10] X. Wu, Y. Li, and H. Chen, "programmable wavelet packet transform process," IEEE Electronics Letters, volume 35. no. 6, pp.449-450. 1999.
- [11] M. A. Trenas, J. Lopez, M. Sanchez, F. Arguello, and E. L. Zapata. "Architecture for wavelet packet transform with best tree searching," in proc. IEEE Int. Conference on Application-Specific Systems. Architectures and Processors. 2000,pp. 289-298.
- [12] M. A. Trenas, J. Lopez and E. L. Zapata, "Architecture for wavelet packet transform", J. VLSI Signal Processing in Vol. 32. pp. 255-273, 2002.
- [13] M. A.Farahani, and M. Eshghi, "Architecture of a Wavelet Packet transform by Using Parallel Filters" IEEE Transaction on Signal Process, 36,961-1005, 2006.