# A Significance of VLSI Techniques for Low Power Real Time Systems

Santosh Chede Assistant Professor Dept.of Electronics and Telecom. Priyadarshini College of Engg. Nagpur Kishore Kulat Professor Dept.of Electronics and Computer Science Engg.VNIT, Nagpur Rajesh Thakare Assistant Professor Dept.of Electronics Engg. Priyadarshini College of Engg.Nagpur

## ABSTRACT

In microelectronics design, power consumption, speed of operation, are crucial constraints. Propagation delay of circuit component has an impact on such factors. Pipelining and parallel processing strategies are utilized for desirable propagation delays and hence for clock and throughput variation respectively. To some extent variation in propagation delay is responsible for power consumption reduction. In this paper, pipelining and parallel processing concepts are analyzed with reference to task scheduling in real time system. Power consumption and speed of operation issues of such systems are analyzed.

#### Keywords

VLSI, power consumption, critical path, DVS, DFS.

#### **1. INTRODUCTION**

Main objectives of most of the system level or circuit design are high performance and power optimization. For high performance system design, propagation delay minimization plays an important role. Basically size, cost, performance and power consumption are the crucial issues in low power portable battery operated system design. Excessive power dissipation which overheats thereby degrading the performance and lifetime is not at all affordable. Energy consumption being an important constraint for battery life estimation, VLSI based low power design of dedicated multimode signal conditioning integrated circuit is desirable. Modern systems consist of digital realization of analog processes and this helps to design system with high precision, high signal to noise ratio (SNR), repeatability and flexibility. DSP systems can be realized with custom designed hardware circuits or ultra low power high performance programmable processors fabricated using VLSI circuit technology.

Basically the role of digital system is to maximize the performance with minimum cost and less time to market. Performance measures are throughput, clock rate, circuit complexity and power dissipation or total energy consumed to execute a real/non real time task. In order to design complex digital system using VLSI technology, modeling with node identification is essential. Generally to carry out design, DSP algorithms are realized and transformed to hardware. To

investigate and analyze data flow and data paths i.e. parallelism and pipelining among tasks and subtasks, system modeling methods like block diagrams, Signal flow graph (SFG), Data flow Graph (DFG), Dependence graph etc. is very much required.

In such design there is trade off between sampling frequency, operating frequency and power consumption, in order to design high performance system. Dynamic Voltage Scaling (DVS), Dynamic Frequency Scaling (DFS) can be used to find optimized solution. Various concepts such as pipelining, parallel processing, retiming, unfolding, systolic array etc. are used in design of modern VLSI based low power.

## 2. VLSI DESIGN TECHNIQUES [7]

Implementation of VLSI design algorithms includes high level architectural transformations. Pipelining ,parallel processing, retiming ,unfolding, folding and systolic array design methodologies plays an important role for optimized high algorithm design. Similarly, performance high level transformations such as strength reduction look ahead and relaxed look ahead are also utilized for design implementation. Strength reduction transformations are applied to reduce the number of multiplications in convolution, parallel infinite impulse response (FIR) digital filters, discrete cosine transforms (DCTs) and parallel rank -order filters. Look ahead and relaxed look ahead transformations are applied to design pipelined direct form and lattice recursive digital filters and adaptive filters and parallel recursive digital filters. And these strategies are used to develop and design architectures for multiplication, addition, digital filters, pipelining styles, low power computations and architectures for high performance programmable or ultra low power embedded digital signal processors, applicable to various biomedical, industrial, defense, consumer applications etc. [1]. In this paper, pipelining and parallel processing constraints with respect to real time system performance is analyzed.

#### 2.1 Pipelining

Always it is preferred to have a system with high clock speed or sample speed or low power consumption. In order to transform original sequential circuit to another circuit to realize these specifications, pipelining is used. Actually it reduces the critical path which will increase sample speed as well as clock speed and hence speed of operation. Critical path is the longest computation path among all paths that contains zero delays and computation time of the critical path is lower bound on the clock period of the circuit. For example, consider a sequential circuit with for 6-Tap FIR filter. It consists of multipliers and adders, as shown in figure 1. Computation time for adder and multiplier is 10 u.t. and 14 u.t. respectively. The critical for this circuit is 64 u.t. Minimum clock period required for the execution is 64 u.t. Sampling time is 64 u.t. and hence sampling frequency is 64 u.t. In order to increase clock frequency as well as sampling frequency, pipelining method such as feed forward cutest is used as shown in figure 2. In pipelining, pipelining latches are placed across the feed forward cutest and total computation time of the critical path is reduced to 24 u.t. is shown in figure 3. Hence it reduces sample and clock duration with a penalty of circuit complexity, latency and power consumption.

Level of pipelining deals with the number of latches connected between the nodes. Sample rate and clock frequency is 24 u.t.. Sample rate (Sr) is equal to Clock period (Tc) for pipelining. For FIR filter the output is given by [4]

$$y_{(n)} = \sum (A_i x_{n-i}) \tag{1}$$

Where,

Ai=Arbitrary values Xn-i = Input signal Y(n) = Output i -0, 1, 2, 3, ------, (N-1), N-integer n-Number of Tap FIR filter

For 6 Tap FIR filter output is given by

$$y_6 = A_0 x_6 + A_1 x_5 + A_2 x_4 + A_3 x_3 + A_4 x_2 + A_5 x_1 + A_6 x_0$$



Critical Path 1. Block diagram of 6-Tap FIR filter



Figure 2. Block diagram of 6-Tap FIR filter with feed forward cutest (pipelining)



Critical Path = 24 u.t.

Figure 3. Block diagram of 6-Tap FIR filter with feed forward cutset and delays (Pipelining)

#### 2.2 Parallel processing

(2)

In parallel processing, multiple input samples can be processed for the same clock pulse as that of original circuit. In this Sampling time  $T_{sample}$  is not equal to the clock duration  $T_{clock}$ . To increase sampling rate for the same clock time, number of sequential hardware can be connected in parallel as shown in figure 4. In this diagram sequential hardware consisting 6 -Tap filter as a nodes A and B are connected in parallel. X(Lk) to X(Lk+m) samples from input signal are processed in single clock pulse with duration Tc, to get output samples, where m = L-1. Single input single output (SISO) system must be converted into multiple input multiple output (MIMO) system.

Level of parallelism (L) depends on the number of sequential circuits connected in parallel. Level of parallelism or block size supports increase in sample rate, to increase speed of architecture thereby processing number of samples in a single clock pulse. Initially critical path will not change but fine grain pipelining can be used for further reduction in critical path. In fine grain pipelining, multipliers are broken into sub multipliers of different computation time, which will reduce critical path in parallel architecture. In case of communication bounded system, where it

is not possible to reduce critical path further, parallel and pipelining can be combined to increase the speed of the architecture. For parallel processing

 $Tsample \neq Tclock$ 

 $Tsample \approx \frac{Tclock}{I}$ 



Figure 4. L level Parallel Multiple Input Multiple Output (MIMO) system

### 3. DVS AND DFS

These techniques are implemented for energy management of real time systems. VLSI technology uses CMOS devices for hardware realization of mixed mode analog and digital integrated circuit or other circuitry. In this, devices dynamically change their speed increasing the energy operation efficiency. The reduction of energy consumption in systems can be achieved without affecting the performance. DVS/ DFS techniques are able to make energy savings while providing the necessary peak computation power in VLSI based systems [1,2,6].

For CMOS technology,

$$P \ \alpha Ct \times Vi^2 \times f \tag{1}$$

$$T_D \alpha \frac{Ccharge \times Vi}{(Vi - V_T)^{\beta}}$$
(2)

$$f = \frac{I}{T_D}$$
(3)

$$E = P_D \times t \tag{4}$$

Where,

P-Power consumption

Ct-Total capacitance

 $C_{charge}$  - capacitance to be charged / discharged in a single clock cycle

f - clock frequency

V<sub>i</sub> - supply voltage

V<sub>T</sub> -threshold voltage

 $T_{D}$  propagation delay

t -task execution time

B-Technology dependent constant (varies between 1&2).

E - Energy consumption.

Lowering the supply voltage  $V_{i}$ , will also decrease the clock speed /speed of architecture (f), as shown in figure 5. Relation between dynamic power P and  $V_i$  is quadratic in nature. DVS and DFS involve dynamical adjustment of supply voltage and the clock speed to reduce the energy consumption of a circuit or system.



Speed of architecture (f)

Figure 5. Trade off relation of speed of architecture and supply voltage

#### 4. REAL TIME SYSTEM DESIGN ISSUES

Real time application deals with critical and non critical task execution [3, 8, 9]. For critical and non critical tasks systems are designed on the basis of power consumption. Task priority decides task execution sequence i.e. critical or non critical. For example, considering Rate Monotonic (RM) fixed priority task scheduling algorithm, task with less period executes first and to execute it faster Normal mode is used. Embedded application may consist of many numbers of tasks, subtasks. Hence task priority scheduling is a primary factor in deciding an execution time. Real time embedded system uses hardware/software codesign strategy [10].

Hardware as well as software consumes energy. Such system consists of number of tasks, execution time and energy consumption for each task can be estimated on the basis of real time constraints. In order to design energy efficient real time system, VLSI technology with low energy consumption strategies has major importance. System design with respect to the issues like power / energy consumption and execution/computation time, are analyzed for low power real time embedded system.

#### 5. RM TASK SCHEDULING SCHEME

The rate-monotonic (RM) scheduling is one of the most widely studied and used in practice [5]. It is uniprocessor static priority preemptive scheme. The priority of the task is inversely related to its period. In this paper, tasks priority is set with respect to RM and energy consumption for the task is estimated. For example, if system consists of number of tasks ( $T_1$ ,  $T_2$ ,  $T_3$ ) with different periods  $P_1=3$ ,  $P_2=4$ ,  $P_3=5$  and execution times in µsec. are  $t_1=0.5, t_2=2.0, t_3=1.73$ . Since  $P_1 < P_2 < P_3$ , task  $T_1$  has higher priority and task  $T_3$  cannot execute when either task  $T_1$  or  $T_2$  is unfinished. Implementation of DVS /DFS dedicate highest clock frequency and maximum energy consumption to complete  $T_1$  than that for  $T_2$ ,  $T_3$ . Frequency is set to get execution time in µsec. of 0.5, 2.0, 1.73 where frequency is voltage and power consumption dependent

## 6. ANALYSIS OF PIPELINING AND PARFALLEL PROCESSING FOR POWER CONSUMPTION REDUCTION.



Figure 6. RM Task scheduling and DVS significance

For CMOS circuits, equations for power consumption and propagation delay are given in (1) and (2). It is mentioned in [ref], pipelining can be used to reduce power consumption of the system/circuit. For the original network critical path has capacitance, Ccharge to be charged or discharged. For the M-level pipelining, critical path of the network is reduced by 1/M of its original length and capacitance to be charged /discharged is reduced to Ccharge /M and total capacitance will not change. The speed of architecture will increase, thereby reducing critical path computation time. Clock duration of such architecture will be less than that of original structure.

$$Tpip \ \alpha \frac{\frac{Ccharge}{M} \times Vi}{(Vi - VI)^{\beta}}$$
(5)

$$Tpip \neq TD \tag{6}$$

$$fpip \neq f \tag{7}$$

$$fpip \ge f$$
 (8)

Where

#### Tpp, fpp- Pipelining time and frequency constraints

Considering DVS, more input voltage has to apply to increase speed of architecture, which will increase power consumption according to equation (1). If frequency increases, real time task will get executed earlier than the scheduled time, which violates the real time functionality. In RM task scheduling, the priority of the task execution is duration dependent. Frequency and voltage variation for real time task execution is given in figure 6.

Where,

 $T_1$  -----,  $T_3$  – Execution time of tasks

 $T_{1B}$ .....,  $T_{3B}$  – Modified Execution time of tasks

 $T_{1C}$  -----,  $T_{3C}$  – Modified Execution time of tasks

V<sub>1A</sub>......V<sub>3A</sub> – Modified voltage levels for tasks

 $V_{1B}$  – Modified voltage levels for tasks

 $V_{1C}$  – Modified voltage levels for tasks

P<sub>1</sub>- First priority

P<sub>2</sub>- Second priority

P<sub>3</sub>- Third priority





Figure 8. Parallel processing capacitance constraints for power consumption

## Figure7. Pipelining capacitance constraints for power consumption

In order to reduce power consumption and to satisfy real time constraints, pipelining can be used thereby keeping same clock duration and reducing critical path. Total capacitance and computation time for the critical path of sequential circuit is C and Ts respectively. After pipelining, critical path get reduced. Capacitance to be charged /discharged in Ts duration is C/4 and total capacitance is C. Hence voltage applied to charge C/4 is reduced to BVi, where B is positive voltage reduction factor less than 1, shown in figure 7. As 'Ts' remains the same, pipelining can be used to satisfy real time constraints with a reduction in power consumption.

Similarly in parallel processing total capacitance is increased by level of parallelism. Time to charge capacitance C is 4Tp, even though total circuit capacitance is 4C. Hence input voltage required to activate the parallel circuit can be reduced to BVi as shown in figure 8. As more time is required it may not satisfy task execution time constraints thereby maintaining constant sample rate.

## 7. CONCLUSION

In this paper, significance of power consumption, speed of operation with reference to pipelining and parallel processing VLSI techniques is analyzed. Hardware implementation of high performance mixed mode IC for real time application can be carried out with the help of low power VLSI techniques. In system design, task execution using pipelining and parallel processing trade off is considered with respect to power consumption. It is observed that pipelining and parallel processing can reduce the power consumption depending on the real time constraints

#### 8. REFERENCES

- Flavius Gruian, "Hard real time scheduling for low energy using stochastic data and DVS processors," in proc. of Int. symposium on Low Power Electronics and Design California, USA, 2001, 46-51.
- [2] Youngsoo Shin, Kiyoung Choi, Takayasu Sakurai, "Power optimization of real time embedded systems on variable speed processors," in proc. of IEEE ACM international conference on Computer Aided Design,2000,365-368.
- [3] C. M. Krishna, Kang G. Shin, "Real time systems," Mc Graw Hill international edition 1997, 47-52.
- [4] Kamal Raj, "Embedded system", Tata McGraw Hill.
- [5] Santosh Chede, Kishore Kulat "Algorithm to optimize code size and energy consumption in real time embedded system"

in international journal of computers (Academy publisher), issue 3, July 2008.

- [6] C.M.Krishna, Yann-Hang Lee, "Voltage clock scaling adaptive scheduling techniques for low power in hard real time systems," IEEE transaction, Computers, Vol.52, No. 12, Dec.2003, 1586-1593.
- [7] Keshab Parhi, "VLSI digital signal processing systems," Wiley India edition, 2007.
- [8] Ramesh Mishra, Namrata Rastogi, Dakai Zhu, "Energy aware scheduling for distributed real time systems," in proc.

of Int. symposium on Parallel and Distributed Processing, IPDP-03, Nice, France.

- [9] Padmanabhan Pillai, Kang G. Shin, "Real time dynamic voltage scaling for low power embedded operating system," in proc. of 18 th ACM symposium on Operating Systems Principles, SOSP,Banff,Canada,2001, 89-102.
- [10] Willan Fornaciari, Paolo Gubean, Donatella Sciuto, Cristina Silvano, "Power estimation of embedded systems: A hardware/software codesign approach," IEEE transaction, Very Large Scale Integration (VLSI) systems, Vol.6, No.2, 1998, 266-275