# A Novel Approach for Multi- Domain Clock Skew Scheduling

I.FlaviaPrincess Nesamani Assistant Professor/ECE Karunya University K.MariaPriyadarshini PG Scholar Karunya University Coimbatore Dr.V.Lakshmi Prabha Principal/GCT, Tirunelveli

## ABSTRACT

The unconstrained clock skew scheduling is practically limited due to the difficulties in implementing a wide spectrum of dedicated clock delays in a reliable manner. This results in a significant limitation of the optimization potential. As an alternative multi domain clock skew scheduling technique with dedicated clock buffer will be implemented. In this paper, an algorithm to determine the minimum number of clock domains to be used for multi domain clock skew scheduling is presented. The experimental results show the optimized clock period, dynamic power consumption implemented on digital logic part of telephone answering machine.

**Keywords:** Clock skew domain, clock skew scheduling (CSS) Low Power VLSI, Synopsys Design Compiler.

### **1. INTRODUCTION**

Clock skew is defined as the differences in clock arrival time at different registers due to the variation in interconnect delays in the clock distribution network. Clock skew is viewed as a design fault. The same clock skew can be a manageable resource. The clock period can be minimized by an optimal assignment of arrival time for sequential components.

Because of process variations, power noise and temperature CSS is becoming challenging. Assigning an individual clock arrival time for each register is becoming impractical and unreliable. This makes the design of clock networks more complicated since each register must be precisely buffered, to satisfy its assigned clock arrival time and clock buffers are susceptible to process variations. Because of process variations, CSS is becoming challenging.

In this technique, the clock skews of individual registers can only be chosen from a finite set of clock phase shifts called clock domains. In the formulation of the problem, process variation is not a direct consideration. The fact is this technique helps to make it practical to inject a limited number of skew values (clock domains) into the circuit. An algorithm is proposed that can be used at design time. Our algorithm can be directly used for other physical realizations of Multi Domain Clock Skew Scheduling. It will be used to refer to domain-based CSS independently of how physical clock domains are implemented. The Multi Domain Clock Skew Scheduling problem is similar to the unconstrained CSS problem in the sense that it also needs to assign clock skews to individual registers in the design. The remainder of this paper is organized as follows. Section II contains the list of Literature Survey done. Section III comprises of the methodology used in the work. Section IV gives the results and discussions. Section V gives the work done and conclusion.

# 2. RELATED WORK

There are different CSS techniques that were presented in the past. An integrated deferred -merge embedding algorithm is proposed in [9] for integrated clock tree construction to perform simultaneous routing, wire sizing and buffer sizing wire widths and levels of buffers are inserted as variables in forming merging segments. An algorithm using cluster-based clock tree construction algorithm and zero skew wire-sizing algorithms is proposed to make wire size improvement in [17] without sacrificing the quality of solution. Only single stage clock tree problem is considered, though it can be extended to multi staged clock routing i.e. the buffer insertion problem is not considered.

The approach uses an iterative method to make wire size improvement. Each time when an alternate choice of wire size for some segment is done, to the root of the tree by zero skew merging to make sure that indeed an improvement is propagated. The problem of minimizing the clock period of a circuit by optimizing the clock skews is addressed in [2]. Incorporate uncertainty factors and present a formulation that ensures that the optimization will be safe.

The problem of clock period optimization is formulated as a linear program. An efficient graph-based solution [3] that takes advantage of the structure of the problem is presented. This method reduces the skews without sacrificing the optimality of the clock period. High speed synchronous digital systems need large switching currents for rapid signal transitions. These large currents create voltage drops on the power distribution network and necessitate expensive chip packaging with a large number of supply pins. An optimization technique to reduce the dynamic transient current drawn from the supply pins is proposed. An approach based on sub-dividing the synchronous clocking into multiple sub-clocks with relative skew is proposed in [16].

Improving the performance of a synchronous digital system by adjusting the path delays of the clock signal from the central clock source to individual flip-flops is done and two linear programs are investigated in [3] to detect clocking hazards. The problem of estimating clock-skew bounds in presence of power supply and process variations is addressed in [5]. Two statistical timing- driven optimization algorithms to reduce the hardware cost of a post silicon tuneable PST clock-tree is proposed in[14]. The tuning capability of a PST clock-tree is dependent on the number of PST clock buffers and their tuneable range Our work done in the paper is to divide the flip-flops into best number of domains in order to optimize the clock period to its minimum level, without violating the timing constraints.

## **3. PRELIMINARIES**

#### To generate:

#### A constraint graph- G(V,E)

Find the longest and shortest paths between all pairs of flip-flops in the circuit graph. There is a hold edge (i, j) forward dashed line and a setup edge(j, i) backward solid line between two nodes i and j in G(V,E) if and only if there exists a direct path (without having other D flip-flops (DFFs) on the path) from i to j in the circuit graph G\*.We define  $T_{ij}$  as the longest path delay from vertices i to j,  $t_{ij}$  as the shortest path delay from vertices i to j, P as the clock period of the circuit,  $E_s$  as the set of all setup edges in E, and  $E_h$  as the set of all hold edges in E[1]. CSS is to assign each flip-flop a specific clock arrival time  $l_i$  such that the timing constraints between any two pairs of the flip-flops are not violated and that the clock period is minimized. This problem can be formulated as the following linear program,

For min P:

| $x_i\text{-}x_j {\geq} T_{ji}\text{+}Tsetup\text{-}P$ | ${}^{(i,j)} \in E_s$  | (1) |
|-------------------------------------------------------|-----------------------|-----|
| $x_i - x_i \ge Thold^{-t}ij$                          | ¥(i,j)єE <sub>b</sub> | (2) |

Where inequality (1) represents the setup time constraint time constraint. Here Tsetup and Thold represent the setup time and the hold time for the register, respectively between flip-flops i and j and inequality (2) represents the hold.

# 4. OPTIMIZATION ALGORITHM

In this section a solution for multi domain clock skew scheduling is presented by optimizing the number of clock domains and the associated skew values is done using a novel optimization technique with out violating any set up and hold time conditions

Step 1:

*Stating the formal problem constraints*: For the given sequential circuit and for a clock period P, Multi Domain Clock Skew Scheduling is done, so as to reduce the clock period with a minimum number of domains.

Step2:

*Calculate the delay between the flip flops*: We need to calculate the maximum and minimum delay values between each flip flop to initialize the clock domain set. In this paper the maximum and minimum delay values between the flipflops are calculated and tabulated

Step 3:

*Initializing the clock domain set*: Depending upon the delay values calculated. Flip flops are divided into domains. In the next section all the flip flops are divided into 4 domains and depending upon the maximum delay in the domain set a corresponding skew value is assigned for each domain.

Step 4:

*Assign skew values*: Depending upon the delay values skew values are assigned depending upon the maximum delay between flip flops and iteratively check if any setup and hold time violations are present.

Step 5:

Avoid Redundant Clock Domains: since the main objective of algorithm is to reduce number of clock domains we have to limit the number of newly created clock domains, i.e., we should rely on global information instead of only looking at the current local information without changing the functionality of the circuit.

#### **EXPERIMENTAL RESULTS**

VHDL coding for digital logic circuit of the telephone answering machine and simulated using Xilinx tool and the RTL schematic and timing report is extracted to test our heuristic for MDCSS. The .VHDL file of the circuit is taken from Xilinx is synthesized using Synopsys design compiler  $0.18\mu$  technology and we calculate area, power, critical path and maximum and minimum delay between flip-flops.



Figure 1 Block diagram of Digital Logic

The clock period is set to be the minimum clock period achieved by unconstrained clock skew (CSS) scheduling. In this way, we can see how many clock domains are necessary in order to reach the full potential of the technique of unconstrained CSS. The output of the algorithm is the minimum number of clock domains with the associated skew values. The block diagram of the digit logic circuit of telephone answering machine is shown.

The schematic view for the test bench circuit in design compiler is shown below which is having 16 flip-flops, 4 inputs and 5 outputs. The highlighted region in the below circuit shows the flip-flops.



Fig.2. Circuit Used For Skewing

The tabular form(Table 1) shows the maximum and minimum delay values between the flip flops. From the calculated delay values, a constraint graph is drawn. The following figure shows the constrained graph of the traffic light controller taking flip flops as nodes and the values on arrow marks shows the maximum and minimum delay values.

| Flip flop→Flip flop         | Max delay | Minimum   |
|-----------------------------|-----------|-----------|
|                             | (ns)      | delay(ns) |
| s1→t1                       | 9.09      | 3.93      |
| t1→t1                       | 9.40      | 2.90      |
| s1→t0                       | 9.40      | 3.93      |
| t2→t2                       | 9.40      | 2.90      |
| t4→s2                       | 5.59      | 5.08      |
| s2→s0                       | 5.01      | 4.94      |
| s0→m1                       | 5.35      | 4.96      |
| s2→play_messages            | 3.57      | 2.61      |
| s1→s1                       | 3.86      | 2.90      |
| Pick_up->pick_up            | 2.36      | 1.84      |
| s0→play_messages            | 4.93      | 2.90      |
| m0→m0                       | 2.90      | 2.71      |
| s1→m1                       | 5.46      | 5.25      |
| t0→t4                       | 6.39      | 5.84      |
| t4→t4                       | 8.41      | 2.90      |
| t4→s2                       | 5.59      | 5.08      |
| t4→s0                       | 5.75      | 5.12      |
| s0→record_on_reg            | 4.27      | 3.53      |
| s0→play_beep_reg            | 3.68      | 3.06      |
| Play_beep_reg→play_beep_reg | 2.90      | 2.07      |
|                             |           |           |
| s1→play_beep_reg            | 4.38      | 3.35      |
| s1→play_msg                 | 4.74      | 2.90      |
| $m \rightarrow 1 m 2$       | 4.23      | 3.14      |

Depending upon the above delay values flip flops are divided into minimum number of domains. The corresponding skew values are also shown to optimize the clock period, compilation time and dynamic power dissipation. The following figure shows the comparison between clock period, compilation time before multi domain clock skewing and after skewing for the traffic light controller. The clock period and the compilation time are optimized to the maximum level



Figure 3 Constrained Graph

#### **Table 2 Domain Set**

| Domains | Flip flops                            | Skew<br>assigned(ns) |
|---------|---------------------------------------|----------------------|
| 1       | t1,t2,t0,t4                           | 9.5                  |
| 2       | s0,s2,play_mesgs,<br>m1,t3            | 5.5                  |
| 3       | m2,play_beep_reg,<br>record_on_reg,m3 | 5                    |
| 4       | s1,m0,pick_up                         | 4                    |



Fig.4 Time Period And Compilation Time



Fig.5 Clock Period Comparison





# 6. CONCLUSION

In this paper the concept of clock skew scheduling (CSS) views clock skews as a manageable resource rather than a liability. VHDL coding for digital logic of telephone answering machine is written and simulated using Xilinx tool and the RTL schematic and timing report is extracted. The .VHDL file of test bench circuit s298 taken from Xilinx is synthesized using design compiler and area, power, critical path and maximum and minimum delay between flip-flops are calculated. The clock

Period, the compilation time and dynamic power consumed is reduced after multi domain clock skew scheduling. This technique is limited only upto 5 domains further the algorithm will be modified to implement for more number of domains

# 7. REFERENCES

- [1] Min Ni and Seda Ogrenci Memik, Senior Member, IEEE, "A fast heuristic algorithm for multi domain clock skew scheduling". IEEE Transactions On Very Large Scale Integration (Vlsi) Systems, Vol. 18, No. 4, April 2010
- [2] R.B.Deokar and S.S.Sapatnekar, "A graph-theoretic approach to clock skew optimization," in *Proc. Int. Symp.Circuits Syst.*, 1994, pp.407–410.
- [3] .P.Fishburn, "Solving a system of difference constraints with variables restricted to a finite set," *Inf. Process. Lett.*, vol. 82, no. 3, pp.143–144, May 2002
- [4] J.P.Fishburn, "Clock skew optimization," *IEEE Trans. Comput.*, vol.39, no. 7, pp. 945–951, Jul. 1990.
- [5] S. Huang, C. Chang, and Y. Nieh, "Fast multi-domain clock skew scheduling for peak current reduction," in *Proc. Asian South Pacific Des. Autom. Conf.*, 2006, pp. 254–259.
- [6] H. Jiang, K. Wang, and M. Marek-Sadowska, "Clock skew bounds estimation under power supply and process variations," in *Proc. ACM Great Lakes Symp. VLSI*, 2005, pp. 332–336.
- [7] V. Khandelwal and A. Srivastava, "Variability-driven formulation for simultaneous gate sizing and postsilicon tunability allocation," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 27, no. 4, pp. 610–620, Apr. 2008.
- [8] C. Lin and H. Zhou, "Clock skew scheduling with delay padding for prescribed skew domains," in *Proc. Asian South Pacific Des. Autom. Conf.*, 2007, pp. 541–546.
- [9] L. Liu, T. Chou, A. Aziz, and D. F. Wong, "Zeroskew clock tree construction by simultaneous routing, wire sizing and buffer insertion," in *Int. Symp. Phys. Des.*, 2000, pp. 33–38.
- [10] K. Ravindran, A. Kuehlmann, and E. Sentovich, "Multi-domain clock skew scheduling," in *Proc. Int. Conf. Comput.-Aided Des.*, 2003, pp. 801–808.
- [11] S. Tam, R. D. Limaye, and U. N. Desai, "Clock generation and distribution for the 130-nm itanium 2 processor with 6-MB on-die 13 cache," *IEEE J. Solid-State Circuits*, vol. 39,no. 4, pp. 636–642, Apr. 2004.
- [12] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, "Clock generation and distribution for the first IA- 64 microprocessor," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1545–1552, Nov. 2000.