# Innovative Low Power Transposition Memory using Double Edge Triggered Flip-flop

P. Vamshi Bhargava PG Scholar – Department of Electronics and Communication Engineering Sreenidhi Institute of Science & Technology, Hyderabad, India

### ABSTRACT

Transposition memory (TRAM) is one of the most important matrix processing block. This paper presents the design of a transposition memory implemented using 1V 45nm CMOS technology in Cadence® Virtuoso® Design Environment. A new double edge triggered flip-flop based on clock-gated pulse suppression technique is developed. This new double edge triggered flip-flop evolved from clock-gating pulse suppression technique reduces the power dissipation in the clocking system. This new clock-gated pulse suppressed double edge triggered flip-flop (CGPSDFF) is used to design the D flip-flop based architecture of a high speed TRAM and power reduction of the CGPSDFF-based TRAM is 20% better than conventional TRAM.

#### **Keywords**

Low power, TRAM, clock system, clock-gated pulse suppression technique, CGPSDFF

## **1. INTRODUCTION**

Image, speech, data and video compression requires high speed matrix calculations. Transposition memory (TRAM) is the major building block of the matrix processing unit. TRAM is highly used in the signal processing algorithms such as DCT, DWT. Consider 2D-DCT, where the first stage computed DCT to be transposed before the second DCT is applied. This implies that data is to be manipulated in the memory itself without using the processing elements. So transposition memory architecture becomes extremely important. As a result, several architectures have been proposed for transposition memory.

There are two typical architectures for TRAM. One is RAM based architecture [1], and the other one is D flip-flop based architecture [2]. Despite of less flexibility, the DFF-based architecture of TRAM can fulfill the performance demand which is widely used in very high speed or real-time applications. The flip-flop and the multiplexer are the basic building blocks of D flip-flop based architecture. Improve in the performance of these components will have a definite impact on the overall performance. Jinn-Shyan Wang et. al. [3] proposed the architecture of TRAM based on a pulseclocked D flip-flop.Zhang Wenle et. al. [4], proposed a static double edge-triggered flip-flop (SDETFF). The design of SDETFF consists of a XNOR-based pulse generator and a front-end sampling circuit. The pulse generator and latch structure of SDETFF is shown in the Fig 1. In this, the inverted version of the clock signal CLK, CLKb is generated by an inverter which generally reduces the number of transistors required to generate the CLKb. Whenever the clock pulse arrives at the input of transistor N4 and N3, the input data D and its inverted version DB is

G.V. Maha Lakshmi Professor of Electronics and Communication Department Sreenidhi Institute of Science & Technology, Hyderabad, India

directly given to T and TB respectively, so the leakage power consumption in SDETFF is quite low.



Fig. 1: Static Double Edge-Triggered Flip-flop

P. Zhao et. at [8], proposed a Conditional discharge flip-flop (CDFF) based on a conditional discharge technique. This CDFF reduces the internal switching activities. In this flip-flop the extra switching activity eliminated by controlling the discharge path when the input is stable high. Double edge-triggered pulse generator [5] utilized for CDFF reduces the power on the clock tree and the clocked transistors in pulsed generators. The SDETFF consists less number of transistors when compared to CDFF. But the redundant pulses in the clock signal exists in both CDFF and SDETFF which leads to the increase in the dynamic power consumption due to the unnecessary charging and discharging of the redundant pulse loads at the internal nodes.Most of the power in VLSI architectures is mainly consumed by the clock. Reducing the power consumed by the clocking system will have deep impact on the total power. In a D flip-flop, whenever there is no

change in the input data or no transition occurs in the data, the redundant clock pulses exits at the internal nodes. Clock gating technique [6] [7] is one of the most prominent low power technique which eliminates the redundant pulses present in the clock signal. Conditional discharge technique is also another technique using which the unnecessary transition at the internal node is eliminated but the redundant pulses still exist.Clock gating technique incorporated with single edge triggered flip-flop eliminates both edges of the clock signal if they are redundant. In a double-edge triggered flip-flop both rising and falling edge of the clock signal are useful. So, clock gating technique alone is not valid for a double-edge triggered flip-flop as one edge of the clock signal is useful and the other is redundant.

In this work, the design of a transposition memory using a new double-edge triggered flip-flop based on a Clock-gated pulse suppression technique is presented. As the pulse triggered flip-flops have negative setup time, soft edge, simple structure and have better performance [9]-[15] than the traditional master-slave flip-flops, a new low power double edge triggered pulse suppression technique is proposed. This proposed double-edge triggered flip-flop is then applied to a DFF-based architecture of the TRAM.

The paper is organized as follows. In Section 2, we briefly discuss the DFF-based overview and architecture of the transposition memory. In Section 3, we describe the building blocks of TRAM and Section 4 shows experimental results. Finally, Section 5 concludes this work.

#### 2. TRANSPOSE MEMORY

#### 2.1 Overview

In many signal processing systems like DCT, transposing the matrix coefficients plays an important role which increases the signal processing speed and the architecture of D flipflop based TRAM [2] performs transposition on-the-fly and eliminates and eliminates the need for double buffering, which would have been necessary had a static dual-addressed RAM been used.



Fig. 2: Block diagram of DFF-based TRAM

The block diagram of the transposition memory has shown in the Fig. 2. Array of cell consists of D flip-flop and this array is configured to receive the inputs from the horizontal as well as vertical direction as well as to shift the data both in horizontal and vertical directions. Control unit consists of selection lines to the multiplexer which control the direction of the inputs to the cell arrays.

### 2.2 Architecture

The architecture of the DFF-based TRAM consists of a 2-D array of shift registers, as shown in the Fig. 3. This  $8 \times 8$  matrix-type TRAM realize a pipelined architecture, and the latency is 8 clock cycles. So, the transposition of the input data will be concluded after the eighth clock signal. This TRAM shifts the input data from top to bottom and left to right. The most important aspects of this design are the ability to shift the data in either direction as well as to switch the direction of data flow during the circuit operation.



Fig. 3: The DFF-based 8 x 8 TRAM

## **3. BUILDING COMPONENTS**

The basic components of DFF-based TRAM are the DFF and the multiplexer. Improve the performance of these two component will have a definite impact on the overall performance of the TRAM.

## 3.1 MUX

The CMOS-style multiplexer shown in Fig. 4 is selected in the design of new double edge triggered based TRAM. All the designs are based on a 1-V 45nm CMOS technology. The traditional CMOS-style multiplexer has been shown to have high performance and lower-power characteristics than the transmission-gate style multiplexer [16]. Meanwhile, the layout of the CMOS-style multiplexer is straight forward and efficient. It is then suitable for this high-speed and lowpower TRAM design.



Fig. 4: CMOS-style two input multiplexer

#### 3.2 Proposed Flip-flop

An explicit-pulsed semi dynamic flip-flop (ep-DCO) does not offer any performance advantage over implicit-pulsed semi-dynamic flip-flop, and consumes large energy due to the explicit pulse generator [17]. But, the pulse generator power consumption can be significantly reduced by haring a single pulse generator among a group of flip-flops. The power consumption in pulse triggered flip-flops can be reduced by suppressing the redundant pulse in the clock signal. In this work, a clock-gated pulse suppression technique is presented to reduce the redundant pulses in the clock signal. This technique can be applied for both single edge and double edge-triggered flip-flops. In this, the unnecessary charging and discharging of the clock pulse at the internal node will be eliminated and leads to power reduction. The proposed double edge-triggered flip-flop consists of pulse generator and latch as shown in the Fig. 5(a) and Fig. 5(b) respectively.

The operation of the proposed pulse generator is as follows. This pulse generator mainly uses three transistors to eliminate the redundant pulses in the clock signal. Depending upon the input D and the output Q of the flip-flop the pulses were generated. When there is no change in the input data or no transition occur the transistor paths created by N5 and N7 or N6 and N7 transistors turned OFF and the node Z stays at high level irrespective of the status of the clock signal. Then the output of the pulse generator CP is zero. When the transition occur the path created by transistors N5 and N7 or N6 and N7 turned ON and the node Z discharges for a short period. So the required pulses are generated at the output port CP.

The operation of proposed latch explained as follows. When the clock pulse CP arrives and the data D changes from 0 to 1, i.e. the transition occur in the input data, the transistor N8 turns on and node X discharges, which pulls the Q high. If data transition from 1 to 0 occurs, the transistor N8 turn off and node X remains unchanged (pre-charge). So, the node Q discharged through the pass transistor N11 [18]. From the operation of the latch and pulse generator, the clock pulse is generated when the input D and the output Q are different i.e. the clock pulse is generated only when the data transition occur in the input. Thus, the redundant pulses are suppressed and the unnecessary transitions of the pulse load are eliminated.





Fig. 5: (a) Proposed clock-gated pulse suppressed DETFF and (b) Pulse Generator.

#### 4. EXPERIMENTAL RESULTS

#### 4.1 Proposed Flip-flop

The performance of proposed clock-gated pulse suppressed double edge-triggered flip-flop and the conventional flipflops are simulated and compared. Fig. 6 shows the setup model for the simulation of the proposed flip-flop. And the applied simulation patterns are shown in Fig. 7. In this set-up model the data buffers power consumption, loading effect of the flip flop and previous stage is also included. The flip-flop has load capacitance of value 20fF capacitor and an extra capacitance after the clock buffer is also placed with a value of 3fF.





Fig. 7: Simulation waveform of the proposed clock-gated pulse suppressed double edge-triggered flip-flop

Simulation waveforms by Cadence® Virtuoso® Design Environment in 45nM CMOS technology with 1V power

#### Fig. 6: Simulation setup model

supply are shown in Fig. 7. As seen from the figure, the pulse signals are generated only when change in the input data occurs and pulse signals are suppressed when there is no change in the input data. From the performance metrics comparison of various flip-flops is shown in Table 1, though SDETFF has less D-to-Q delay than the proposed flip-flop, average power and power-delay product of the proposed flip-flop are low. In CGPSDFF, power consumed by the clock signal is reduced by suppressing the redundant pulses present in the clock signal.

Table 1 Performance metrics comparison of various flipflops

| Type of Flip-<br>flop         | CDFF [8] | SDETFF<br>[4] | CGPSDFF<br>(Proposed) |
|-------------------------------|----------|---------------|-----------------------|
| Transistors                   | 28       | 18            | 24                    |
| Maximum D-to-<br>Q delay (ns) | 0.88     | 0.62          | 0.76                  |
| Average power<br>(µW)         | 2.43     | 1.90          | 1.54                  |
| Power Delay<br>Product (fJ)   | 2.14     | 1.18          | 1.16                  |

## **4.2 TRAM**

The DFF-based TRAMs are designed by using the flip-flops CDFF, SDETFF and the proposed clock-gated pulse suppressed double edge-triggered flip-flop (CGPSDFF), respectively. The simulation result of the transposition memory designed using CGPSDFF as shown in the Fig. 8.



Fig. 8: TRAM simulation waveforms

The design and simulation of this proposed transposition memory is carried out in Cadence® Virtuoso® Design Environment with 1V power supply voltage and running at 50-MHz frequency. From the simulations results, it is found that the power dissipation of the various TRAMs designed is different. Fig. 9 shows the power consumption of transposition memory using CCDETFF, SDETFF and the proposed flip-flop CGPSDFF respectively. From the simulation results, the power dissipation of the transposition memory designed using clock-gated pulse suppressed double edge-triggered flip-flop is 20% less compared to SDETFF.



Fig. 9: Power consumption of various transposition memories

## 5. CONCLUSION AND FUTURE SCOPE

A DFF-based transposition memory (TRAM) has been designed in 45-nm CMOS technology. The power consumption of the new TRAM is smaller than that of other TRAMs. Power reduction of the new TRAM comes from the utilization of new double edge-triggered flip-flop based on clock-gated pulse suppression technique (CGPSDFF). The redundant pulses in the clock signal are suppressed to eliminate the charging and discharging of unnecessary pulse loads. Thus, the power consumption of TRAM of this is reduced by 20% when compared to TRAM based on SDETFF.

The transposition memory (TRAM) of this work can be used to implement higher order Discrete Cosine Transform (DCT) and various matrix processing blocks. Further improving the performance of new double edge-triggered flip-flop utilized in this paper improves the performance of transposition memory.

#### 6. REFERENCES

- 0.8μ 100-MHz 2-D DCT core processor," *IEEE Trans.* Consumer Electronics, Vol. 40, pp. 703-710, Aug. 1994.
- [2] T. Xanthopoulos and A. Chandrakasan, "A low-power IDCT Macro-cell for MPEG-2 <u>MP@ML</u> exploiting data distribution properties for minimal activity," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 693-703, May 1999.
- [3] P.H. Yang, J.S. Wang, and Y.M. Wang, "A 1GHz Low-Power Transposition Memory Using New Pulse-Clocked D Flip-Flops," Proc. *IEEE Int'l. Symp. Circuit* and Systems, vol. 5, pp. 665-668, May 2000.
- [4] W. L. Zhang, W. L. Goh, K. S. Yeo and G. H. Lim, "A novel static dual edge-trigger flip-flop for highfrequency low-power application", *Proc. IEEE Int. Symp.- Integrated Circuits* 2007 pp. 208-211.
- [5] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V.De, "Comparative delay and energy of single edge-triggered and dual edge triggered pulsed flip-flops for high-performance microprocessors," *in Proc. ISPLED*, pp. 207–212, 2001.
- [6] S. S. Salankar and J. Shinde, "Clock gating A power optimizing technique for VLSI circuits," *IEEE India Conf.*, Annual IEEE, pp. 1-4, 2011.

- [7] W. H. Robinson, X. Wang, "A low-power double edgetriggered flip-flop with transmission gates and clockgating," *Proc. MWSCAS*, pp. 205-208, 2010.
- [8] P. Zhao, T. Darwish, and M. Bayoumi, "Highperformance and low power conditional discharge flipflop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 5, pp. 477–484, May 2004.
- [9] G. Gerosa, "A 2.2 W, 80 MHz superscalar RISC microprocessor," *IEEE Journal Solid-State Circuits*, vol. 29, no. 12, pp. 1440-1454, Dec. 1994.
- [10] A U. Ko and P. Balsara, "High-performance energyefficient D-flip-flop circuits," *IEEE Trans. Very Large Scale Integra. (VLSI) Systems*, vol. 8, no.1, pp. 94-98, Feb. 2000.
- [11] J. Yuan and C. Svensson, "High-speed CMOS circuit technique," *IEEE J. Solid-State Circuits*, vol. 24, no. 1, pp. 62-70, Feb. 1989.
- [12] B. Nikolic, V. G. Oklobzija, V. Stojanovic, W. Jia, J. K. Chiu, M. M. Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," *IEEE Journal Solid-State Circuits*, vol. 35, no. 6, pp. 876-883, Jun. 2000.
- [13] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J.Sullivan, T. Grutkowski, "The

implementation of the Itanium-2 microprocessor," *IEEE Journal Solid-State Circuits*, vol. 37, no. 11, pp.1448-1460, Nov. 2002.

- [14] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," in *Proc. IEEE Dig. ISSCC*, pp. 138-139 1996.
- [15] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta, R. Heald, and G. Yee, "A new family of semidynamic and dynamic flip-flops with embedded logic for high performance processors," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 712-716, May 1999.
- [16] W. Fichtner and R. Zimmennann, "Low power logic styles: CMOS versus pass-transistor logic," *IEEE J. Solid-state Circuits*, vol. 32, no. 7, pp. 1079-1090, July 1997.
- [17] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V.De, "Comparative delay and energy of single edge-triggered and dual edge triggered pulsed flip-flops for high-performance microprocessors," *in Proc. ISPLED*, pp. 207–212, 2001.
- [18] G. Mareswara Rao, S. Rajendar, "Low Power Pulsed Flip-Flop using Self Driven Pass Transistor Logic", *International Journal of Computer Applications*, vol. 80, no. 15, pp. 9-12 Oct 2013.