## Implementation of Virtual Cut-Through Algorithm For Network on Chip Architecture

Yogita A. Sadawarte Research Scholar, B.D. College of Engineering, Sevagram, Dist Wardha, India Mahendra A.Gaikwad

Professor, B.D. College of Engineering, Sevagram, Dist Wardha, India Rajendra M.Patrikar Professor, Visvesvaraya National Institute of Technology

## ABSTRACT

In The Network on chip (NoC) is an approach to designing the communication subsystem between IP cores in System on Chip (SoC). Network on chip provides an attractive alternative solution to traditional bus based interconnection scheme. NoC architectural design has ability by which various IP cores communicate with one another through router & switching mechanism. The switching mechanism plays a vital role to move the data from an input channel and place it on an output channel. Virtual cut through (VCT) and wormhole (WH) switching techniques are widely used in NoC architecture. In this paper, virtual cut through switching technique has been proposed for Network on chip architecture and its performance is analyzed using the parameters such as latency & power.

In this paper we discuss the designing and implementation of VCT router for four IP cores or nodes. The simulation of VCT system is done in Modelsim-SE as a simulation & debugging tool. The design is synthesized in Xilinx ISE 9.1i for the packet size of 16 bits (0-15) on the platform of family automotive spartan2 for device-XC2S200, PQG208 package and speed -5.

## **General Terms**

Switching technique, Routing algorithm.

#### **Keywords**

NoC architecture, Virtual cut through algorithm, Latency.

## **1. INTRODUCTION**

Network on Chip architecture is composed of multiple resources or intellectual property (IP) cores, connected by channels intersecting at various switches. The switches have buffers for each interface which queue message packets waiting for a given destination. On-chip network contains four fundamental components. These are IP-core, network adapter, router and links. For comparison of different NoC architectures a standard set of performance metrics are used.

## 1.1 Transport Latency

It is defined as the time that elapses between occurrence of a message header injection into the network at the source node & occurrence of a tail flit reception at the destination node. Overall latency is addition of Sender overhead, transport latency & Receiver overhead. Less latency is desirable characteristics of NoC architecture.

## 1.2 Energy

When flit travel on the interconnection network, both the inter switch wires & logic gates in the switches toggle & this will result in energy dissipation. The dynamic energy dissipation is caused by the communication process in the network. Desirable characteristics of NoC architecture is that it should have low energy dissipation.

Design of NoC architecture is mainly parameterized on size of packets, length & width of physical links, number & depth of virtual channels and switching technique used for switches. Switching technique is one of the vital parameter to optimize the performance of NoC design. [1]

## 2. SWITCHING TECHNIQUES

In the design of communication network, the selection of appropriate switching technique is one of the challenging task. Switching is the actual mechanism that removes data from an input channel and places it on an output channel. [2] Which output channel is to be chosen depends on routing algorithm. It determines how network resources are allocated for data transmission; the routing algorithm decides how and when the input channel is connected to the output channel. In switching message having, header & data are transported to the destination through the nodes as shown in figure 1. [3]



Figure 1: Switching

In packet switching, each packet carries its own addressing information. Hence, buffer requirement is high in these cases. There is provision of a storage facility at each node. In case of contention, messages are stored in intermediate nodes & then sent forward to a selected adjacent node. This selection of node is made by well-defined decision rule referred as switching algorithm. This process repeated until the message reaches the destination node. Characteristics of packet switching are more complete resource sharing, higher channel utilization & lower network delay. Switching technique used in NoC architecture is mainly characterized into two types.

#### 2.1 Wormhole Switching

It is the most common switching technique used now adays in commercial machines. It is most popular & well suited switching technique. Packet may spread into many consecutive routers & links like a worm hence the name is wormhole switching. Message is divided into packets which are further divided into flits. Here header flit set up the path, subsequent flits follow the header. Only header experiences the latency. Remaining flits of the same packet follows the path chosen by the header flit. The entire packet is blocked When channel is blocked then remaining flits are stored at their current switch. The advantage of this method is that message reordering is not required because transmission is sequential; due to this only one message can be sent over a given physical connection at a time. Wormhole routers use often only input buffering. The buffer space is smaller (only one flit at smallest), hence [4]. The main disadvantage of this method is higher latency; therefore it is not suitable for the real time data transfers.

#### 2.2 Virtual Cut Through Switching

In order to reduce the time to store the packets at each node, Kermani and Kleinrock introduced a technique called VCT switching [3]. When a message comes in an intermediate node and if its outgoing channel is free then it is transmitted out immediately. Instead of waiting for the whole packet buffered, the incoming header flit is cut through into the next router as soon as the decision was made and the output channel is free. Every further flit is buffered whenever it reaches the router, but it is also immediately cut-through to the next router if the output channel is free. In case of no resource conflicts along the route, the packet is effectively pipelined through successive routers as a loose chain of flits. All the buffers along the routing path are blocked for other communication requirements. Buffers are required when a busy channel is encountered. VCT routers have buffers for whole packet. The distance between the source and destination has little effect on communication latency. Each node must provide sufficient buffer space for all the messages passing through it, and because multiple messages may be blocked at any node, a very large buffer space is required at each node. VCT switching of a packet is shown in figure 2.



#### Figure 2: Virtual Cut Through Switching.

## 3. IMPLEMENTATION OF VCT ALGORITHM

The VCT router is designed and implemented for the four nodes (SoC's). The design is synthesized in Xilinx ISE 9.1i for the packet size of 16 bits (0-15) on the platform of family automotive spartan2 for device-XC2S200 and PQG208 package& speed -5.

#### 3.1 VCT Router

Virtual cut through router is designed for four nodes. Each node is connected to all the four nodes by the link. This cross bar switching fabrics has total 16 links. All these nodes transfer the data when clock signal is high or the event is occurred on it. Router has four inputs in terms of packets of size 15 down to 0 & outputs of size 7 down to 0. Each packet has source and destination addresses each of two bits (1 down to 0). RTL view of VCT router is obtained using Xilinx synthesis tool as shown in figure 3. VCT router has been tested for both the condition such as without contention and with contention. In case of contention, Round Robin (RR) scheduler plays a vital role to decide the priority to transfer the packet.



Figure 3: RTL view of VCT Router

#### 3.2 Scheduler

Round-Robin scheduling algorithm is used in our VCT system as a scheduler which assigns time slices to each process in equal portions and in circular order. Round-Robin scheduling algorithm is simple and easy to implement, and starvation-free. If two or more source addresses are demanding the same destination address then contention occurs, then as per Round-Robin scheduling algorithm, priority is given to node 1 first then to node 2, then to node 3 & so on in a cyclic manner.

#### 3.3 VCT System

VCT system is taken as main entity in which VCT router & scheduler are declared as components & port mapped to make the signal connections properly. The RTL view of VCT system



Figure 4: RTL view of VCT system

#### 3.4 Packet Generator

Packet generator takes eight bit random data from random input generator along with source and destination address to generate the packet. Packet generator is designed to generate the packets each of 16 bit in size for testing the performance of the VCT system.

Table 1. Addresses of Nodes

| Sr.No. | Name  | Address |
|--------|-------|---------|
| 1      | Node1 | 00      |
| 2      | Node2 | 01      |
| 3      | Node3 | 10      |
| 4      | Node4 | 11      |

Table 1 shows arddress of nodes.Packet pattern used is Header Identifier (4 bits) <Source Address (2 bits)><Destination Address (2 bits)><Data (8 bits)>.

Source & destination address of each node is same. Each packet consists of first four bit as a header to identify starting of the packet, two bit source address followed by two bit destination address and after that eight bit random data.

#### 4. SIMULATION OF VCT

The simulation of the algorithm is carried out on Modelsim-SE as a simulation & debugging tool. The algorithm is validated for the fixed number of packets. If any one of the node is not used then 'Z' is appears at the output of that node. Simulation waveform is shown in figure 5.

#### 4.1 Without Contention

All Destination address are different from each other, i.e. no address is repeated in a set of 4. In this case, all addresses has been made active at the same time in first clock cycle, by the scheduler. Therefore the output of router obtained in one clock cycle only.

#### **4.2 With Contention**

If any one or more than one destination address is repeated in a set of 4, then contention occurs. In this case, the packet has been holded for the one clock cycle. If output node becomes free, then packet is released .Output of router will require more than one clock cycle depending on the conditions of contentions.



Figure 5: Simulation Waveform of VCT

#### 5. PERFORMANCE OF VCT

Performance of VCT is analyzed on basis of average latency & power.

#### 5.1 Average Latency

100

Up to 100 clock cycles, the average latency is determined, as shown in table 2.

| Number     | Average latency |
|------------|-----------------|
| of Packets | of VCT (ns)     |
| 20         | 250             |
| 40         | 260             |
| 60         | 270             |
| 80         | 290             |

Table 2. Average Latency of VCT

Graph of average latency of VCT for different number of packets is drawn as shown in figure 6.

290



Figure 6: Graph of Average latency of VCT VS Number of Packets



# Figure 7: Graph of Average of Latency of Nodes Vs Number of Packets.

Also average latency of all the four nodes is shown in figure7.

#### 5.2 Power

The power consumed by VCT is calculated using Xpower tool of Xilinx. Total power consumed in VCT is found to be 25 mW.

#### 7. REFERENCES

- Y.A.Sadawarte, M.A.Gaikwad and Rajendra M.Patrikar "Review of Switching Techniques for Network-on-Chip Architectures", International Journal on Computer Engineering & Information Technology, Vol 17, No.22, Special edition 2010, pp. 52-57.
- [2] Y.A.Sadawarte, M.A.Gaikwad and Rajendra M.Patrikar "Comparative study of switching techniques for Network on chip Architectures" ACM Digital Library http://dl.acm.org/citation.cfm?id=1947940
- [3] Parviz Kermani and Leonard Kleinrock "Virtual Cut-Though: A New Computer Communication Switching Technique", North Holland Publishing Company, Computer Networks 3 (1970) pp. 267-269.
- [4] Partha Pratim Pande, Michael Jones,,Andre Lanov,andResveSaleh, "Performance evaluation and Design Trade–Offs for Network –on Chip Interconnect Architectures" Published by Computer Society, 15 June 2005.
- [5] T.T.Ye, L. Benini, G. D. Micheli "Analysis of Power Consumption on Switch Fabrics in Network routers". In Proc. Design Automation Conference, 2002.
- [6] Ankur Agarwal, Cyril Iskander & Ravi Shankar "Survey of Network on Chip (NoC) Architectures & Contributions", Journal of Engineering, Computing and Architecture Volume 3, Issuel, 2009.

### 6. CONCLUSION

In VCT, only header flit contains routing information & therefore each incoming data is simply forwarded along the same output channel as its predecessor. Hence transmission of different messages can't be multiplexed over one physical channel without any additional support.

Each node must provide sufficient buffer space for all the messages passing through it, and because multiple messages may be blocked at any node. A very large buffer space is required at each node is the main drawback of VCT switching technique. Average latency of VCT increases with increase in number of packets. Power consumption of VCT is 25 mW, on the platform of family automotive Spartan 2 for device-XC2S200 and PQG208 package & speed -5.

- [7] Erno Salminen,Ari Kulmala, and Timo D.Hamalainen"Survey of Network-on-chip Proposals",WHITE PAPER, MARCH 2008.
- [8] Section#7: Routing algorithms and switching techniques (CS838: Topics in parallel computing, CS1221, Tue, Feb16, 1999)
   HTTP://PAGES.CS.WISC.EDU/~TVRDIK/7/HTML/SECTIO N7.HTML#@MAIL/VIRTUAL.
- [9] J. Duato, A. Robles, F. Silla, R. Beivide, Universidad de Cantabria, "A Comparison of Router Architectures for Virtual Cut-Through and Wormhole Switching in a NOW Environment".
- [10] Dan Marconett University of California, Davis, CA
  95616, USA, "A Survey of Architectural Design and Implementation Tradeoffs in Network on Chip Systems".
- [11] Nilanjan Banerjee, Praveen Vellanki and Karam S Chatha." A Power and Performance Module for Network on Chip Architectures", In DATE'04: Proceedings of the Conference on Design, automation and test in Europe, pp. 21-25, Washington, DC, USA, 2004.IEEE Computer Society.
- [12] T.T.Ye, L. Benini, G. D. Micheli "Analysis of Power Consumption on Switch Fabrics in Network routers". In Proc. Design Automation Conference, 2002.
- [13] Arnab Banerjee, Robert Mullins and Simon Moore "A Power and Energy Exploration of Network-on-Chip Architectures". Proceedings of the First International Symposium on "Networks-on-Chip 2007".