VLSI Implementation of Image Denoising Algorithm using Dual Tree Complex Wavelet Transform

S. K. Umar Faruk  
PhD Scholar  
JNTU Anantapuramu

K. V. Ramanaiah, PhD  
Assoc Prof & Head,  
Dept of ECE,  
Y. S. R. College of Y.V.U  
Prodduturu, A.P. India

K. Soundararajan, PhD  
Prof & Dean,  
Dept of ECE,  
TKR Engineering College  
Hyderabad, T.S

ABSTRACT

Digital images are often contaminated by noise, which degrades their visual and information quality sternly. Images can be corrupted at any stage of its acquisition and transmission through the medium. Image denoising is an essential process intended to eliminate the noise from naturally corrupted images. Wavelets were proved to be an excellent solution to denoising problems due to its remarkable capability in parallel time-frequency analysis. The wavelet transforms are based on shrinking the wavelet coefficients. Though, the Discrete Wavelet Transform (DWT) is an efficient tool, it suffers with specific limitations which reduced its use in many applications. Kingsbury introduced a redundant complex wavelet transform to avoid the limitations in standard DWT. Addressing this case various algorithms were emerged as a result of the vast research in this domain. However, in that work, the de-noising scheme was only realized in software manner. This work focuses on the hardware realization of a real-time wavelet de-noising procedure. The proposed de-noising method mainly consists of three modules: a DTCWT, a thresholding, and inverse DTCWT modular circuits. Two stage 2D-DTCWT based image denoising has been performed using soft thresholding method and then the hardware software co-simulation design has been synthesized in Xilinx ISE 14.5 and implemented on vertex 5 FPGA kit which operates at a frequency of 207.711MHz.

General Terms

Image Degradation, Image Noises, Image Denoising, Thresholding and Hardware Implementation et. al.

Keywords

DTCWT, Denoising, Soft-thresholding, PSNR, and FPGA

1. INTRODUCTION

Image Denoising has remained an essential problem in the field of image processing. In fact, there exist a number of various types of techniques for noise reduction. Among these techniques, the wavelet based de-noising method has been considered as one of the most efficient techniques [1]. Since decades ago, wavelet-based techniques have been extensively used in digital signal processing [2][3]. In short, wavelet transforms represent the temporal features of a signal by its spectral components in frequency domain. The theory of wavelet transforms states that signals to be analyzed can be decomposed into a different scales with different time and frequency resolutions using multi-resolution analysis algorithm [3][4]. The Discrete Wavelet Transforms (DWT) which implements simultaneous high pass and low pass filtering construction has proven to be the superior in the progressive signal analysis both in spectral and spatial domains and provided simultaneous time-frequency regularization. In spite of its simultaneous time-frequency analysis, it suffers from three major drawbacks: poor directionality, shift-invariance and loss of phase information. To overcome these limitations two quadrature distinct real DWTs are appropriately integrated to form Dual Tree Complex Wavelet Transform (DTCWT). The conventional discrete wavelet transform (DWT) only exhibits shift invariance when implemented in its un-decimated form, which is computationally inefficient, particularly in multiple dimensions. The directional selectivity of the DWT is poor because its separable filters cannot discern between edge characteristics on opposing diagonals. DTCWT has the advantage of approximate shift invariance, good directional selectivity in two dimension and perfect reconstruction over the conventional discrete wavelet transform [5].

In general Data acquired by image sensors are degraded by noise. Improper instruments, problems with the data acquisition process, and interfering natural phenomena can all degrade the data of interest. Moreover, noise can be introduced by compression and transmission errors. Thus, denoising is often an essential step to be taken before the image data is analyzed. Hence, to enhance these images, the unwanted noise needs to be removed. In image processing, noise removal is performed through the usage of filtering-based denoising methods. However, the filtering techniques lead in some cases to severe effects when applied indiscriminately to an image. In fact, if it is not the entire image that is blurred, some of its significant features (e.g. edges) are. A simple solution to overcome this problem has been build up by Donoho and Johnstone [6]. Their procedure comprises of utilizing the DWT followed by a thresholding operation. This method exploits the energy compaction ability of the wavelet transform to separate the image from the added noise. [7].

In general, majority of the image denoising algorithms in the literature operate in software mode efficiently. But the software implementation has several disadvantages like complex operations that have to realized by a large sequence of simple operations which cannot be implemented in parallel. Therefore, it is quiet difficult to meet real time needs with software implementation. Hence, it is desirable to implement the image processing operations on hardware using VLSI techniques which supports the real-time requirements. Hardware realization has emerged as a feasible solution to enhance the performance of the image processing algorithms. This paper focuses on the hardware implementation of the Dual Tree Complex Wavelet Transform and image denoising algorithm on FPGA using Xilinx System Generator... The standard image denoising model using DTCWT is shown in...
fig (1), in which the noisy image is firstly decomposed into DTCWT sub bands and the noisy coefficients are suppressed using denoising process. After noise suppression, the denoised DTCWT sub bands are reconstructed into spatial domain.

**Fig (1): The Standard Image Denoising Model Using DTCWT**

### 2. DUAL TREE COMPLEX WAVELET TRANSFORM

The DTCWT calculates the complex transform of a signal by means of two different discrete wavelet decompositions. If the filters used in one are principally designed to differ from those in the other by a half sample delay, then it is feasible for one DWT to generate the real coefficients and the other to produce the imaginary coefficients. This redundancy of two offers extra information for analysis. It also provides approximate shift-invariance yet still allows perfect reconstruction of the image. The dual-tree complex wavelet transform consists of two parallel wavelet filter bank trees that include carefully designed filters of different delays that minimize the aliasing effects due to down sampling [8]. The dual-tree CWT of a signal x(n) is implemented using two critically-sampled DWTs in parallel on the same data, as shown in Fig. 2. The transform is twice expansive because for an N-point signal it gives 2N DWT coefficients. The analysis and synthesis filter banks[9] used in the proposed DTCWT framework are Length-10 filters based on Farris wavelet implementation. A different set of analysis and synthesis filter banks are used for first stage and higher stages.

**Fig (2): Dual Tree Complex Wavelet Transform trees; (a) Decomposition tree and, (b) Reconstruction tree**

#### 2.1 Wavelet Thresholding

Thresholding of wavelet coefficients is performed by soft thresholding function as shown in fig(3). The first function in fig (3) is not appropriate for image denoising because of its linear nature. Basically the thresholding techniques can be classified into two categories like Global Threshold (GT). and Level Dependent Threshold (LDT)

Soft thresholding function is represented as

\[
\text{Soft} (w) = \begin{cases} 
\text{sgn}(w) \max(-\lambda, 0) > \lambda \\
0, \quad \lambda \leq |w| 
\end{cases} 
\]  

\[\text{Soft} (w) = \begin{cases} 
\text{sgn}(w) \max(-\lambda, 0) > \lambda \\
0, \quad \lambda \leq |w| 
\end{cases} \tag{1} \]

In the proposed method, we have used universal threshold which totally dependent on the size of the signal.

\[
\lambda = \sigma \sqrt{2 \log(k)} \tag{2}
\]

Where k is the size of the signal and \(\lambda\) is the threshold value. The standard deviation and noise variance measures are required to these thresholds for design. Donoho and Johnstone recommended a robust technique of estimating \(\sigma\) in the wavelet domain that is based on the wavelet coefficients at finest level. Estimation of noise level \(\sigma\) ‘based on the median absolute derivation [12] is given by

\[
\sigma (\text{mad}) = \text{median}_{0.6745} \left( \frac{\text{median} \left( |w(i,j)| \right)}{0.6745} \right) \tag{3}
\]

Here \(w(i,j)\) represents the finest level detail coefficients.
3. FPGA IMPLEMENTATION OF IMAGE DENOISING

This section presents the hardware implementation of a two stage DTCWT implementation using 10-tap filter bank and shrinkage of noisy DTCWT coefficients where soft thresholding approach is used to denoise the coefficients in transform domain. The block diagram of proposed design for one level DTCWT is shown in fig (4). Realizing a dedicated hardware would greatly decrease the limitations of the software design with an extensive support to reconfigurable computing technology, Field Programmable Gate Arrays (FPGA) technology has become a feasible target for implementation of the image processing algorithms on hardware. After planning the proposed algorithm, it is modeled using Xilinx block set library. The input noisy image is set to Xilinx system generator models in the form of vector in Xilinx fixed point format. Now the Xilinx system generator model is simulated in MATLAB/Simulink environment with appropriate simulation parameters. Once the accepted denoising performance is obtained, system generator token has been configured for Vertex 5 FPGA board. Xilinx System Generator supports hardware co-simulation [10], making it possible to incorporate a design running in an FPGA directly into a Simulink simulation. The model is realized for JTAG hardware co-simulation once I/O clock scheduling is done. On compilation, the netlist and Xilinx ISE accessible programming file have been generated in verilog HDL. The developed image denoising model is verified for behavioural syntax and then it is synthesized and implemented on FPGA. The Xilinx system generator has the feature of configuring user constraint file, test vectors and test bench for testing architecture. Bit stream compilation is done to create an FPGA bit file that is suitable for FPGA input and implemented on Virtex 5 target device.

3.1 Implementation Process

In this effort, a hardware-software co-simulation algorithm has been designed for denoising images and implemented on FPGA. The registered noisy images are considered for this work. Then the noisy image of size 256*256 have been applied to 2D-1D block for the conversion of the two-dimensional image data to one dimensional bit stream using simulink block sets. Then, it is given as inputs to system generator model for FPGA implementation process. The proposed denoising model implements a forward 2-level DTCWT hardware to decompose the noisy image into transform domain and the sub band coefficients are obtained. Except the approximate sub band coefficients all the six detail sub bands (Three from first level and three from second level) of real tree and the corresponding six sub bands of imaginary tree are denoised with the hardware shown in fig (6). The denoised sub band coefficients are reconstructed back into spatial domain with the corresponding reverse 2-level DTCWT hardware shown in fig (7).

Using the separable property, a 2D-DWT can be realized with two 1D-DWTs, amongst which one will operate on rows and another on column to carry out 2D processing as shown in fig(5). The analysis and synthesis filter banks[11] used in the proposed DTCWT framework are Length-10 filters based on Farras wavelet implementation. A separate set of analysis and synthesis filter banks are used for first stage and higher stages. For compactness of the 2-level DTCWT hardware, the compacted view of the denoising hardware is presented in fig(6).
median process is implemented by sliding a window of odd size of 3x3 over an image. A 3x3 window size is opted which is considered effective for most commonly used image sizes. An optimized element for a 3 x 3 median estimator is performed using a three-input comparison block, which returns the values in the sorted order, as shown in figure 6. The three new values received from the line buffers in each clock cycle are sorted by the first block. The results are input to new comparison blocks, which reject the values farthest from the middle position in each operation stage.

4. EXPERIMENTAL RESULTS

Fig 6. System Generator Model of Denoising Hardware

Threshold estimation stage computes the compatible threshold for estimated noise strength as represented in (2). The sub band coefficients are denoised in soft thresholding stage with the estimated threshold (1) value. The denoised sub band coefficients are reconstructed back using a two-stage inverse DTCWT hardware shown in fig(7). The dual-tree CWT is simple as the forward transform. To reverse the transform, the real and the imaginary part are each inverted—the inverse of each of the two real DWTs are used—to obtain two real signals. To obtain the final output these two real signals are then averaged.

Fig 7. System Generator model of the two-level reverse DTCWT hardware

The experimental results are carried out on test image Lena of resolution 256 x 256. The images are taken in gray scale. The type of noise added to original test image is of Gaussian nature of different noise levels. The test results of System Generator output and FPGA outputs are shown in figures 11(a)& (b). The PSNR and MSE values (Software and Hardware) of denoised image are tabulated in Table (1). The functional simulation of the denoising hardware is performed using ISE simulator and the results are illustrated in figure (8).

Figure 9. shows the synthesized top level RTL view of the proposed 2D DTCWT denoising hardware. Table 2 depicts the synthesis report summary of the proposed implementation.

Fig 9. Top level RTL view of the proposed 2D DTCWT denoising hardware

Fig 10. (a) Clean Lena Image (b) Noisy image at σ = 20

Fig 11. (a) System generator simulation output and (b) FPGA output for of proposed method
5. CONCLUSION
In MATLAB Simulink R2012b, soft thresholding operator is used for denoising the image using 10-tap wavelet filters for different levels. From the results, it is revealed that level 2 is showing better performance for all the variance values at different scales. Hence, level 2 has been chosen for FPGA implementation. Two level 2D-DTCWT based image denoising has been performed using soft thresholding operator and then the hardware software co-simulation design has been synthesized in Xilinx ISE 14.5 and implemented on vertex 5 kit for low power and high-speed performance. From the results, it is observed that the design operates at a maximum frequency of 207.771MHz.

It is to be concluded that the present investigation is successfully developed DTCWT based hardware software co-simulation algorithm using length -10 filter for image denoising and implementation on FPGA.

6. REFERENCES