Size based Multithreaded Scheduler for Hadoop Framework

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Analysing and Implementing the Mobility over MANETS using Random Way Point Model

April

2013

Issues Related to Transit Network Design Problem

June

2015

Neural-Fuzzy Approach for Power Load Forecasting Analysis

May

2013

A Comprehensive Survey on Online Anomaly Detection

June

2015

Reseach Article

Size based Multithreaded Scheduler for Hadoop Framework

Published on December 2015 by Poonam S. Patil, Rajesh N. Phursule

National Conference on Advances in Computing

Foundation of Computer Science USA

NCAC2015 - Number 6

December 2015

Authors: Poonam S. Patil, Rajesh N. Phursule

Poonam S. Patil, Rajesh N. Phursule . Size based Multithreaded Scheduler for Hadoop Framework. National Conference on Advances in Computing. NCAC2015, 6 (December 2015), 20-23.

@article{

author = { Poonam S. Patil, Rajesh N. Phursule },

title = { Size based Multithreaded Scheduler for Hadoop Framework },

journal = { National Conference on Advances in Computing },

issue_date = { December 2015 },

volume = { NCAC2015 },

number = { 6 },

month = { December },

year = { 2015 },

issn = 0975-8887,

pages = { 20-23 },

numpages = 4,

url = { /proceedings/ncac2015/number6/23397-5068/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Advances in Computing

%A Poonam S. Patil

%A Rajesh N. Phursule

%T Size based Multithreaded Scheduler for Hadoop Framework

%J National Conference on Advances in Computing

%@ 0975-8887

%V NCAC2015

%N 6

%P 20-23

%D 2015

%I International Journal of Computer Applications

Abstract

The majority of large-scale data severe applications executed by data centers are based on MapReduce or its open-source implementation i. e. Hadoop. For processing huge sum of data in parallel Hadoop programming framework provides Distributed File System (HDFS)[2] and MapReduce Programming Model[3]. Job scheduling is an imperative process in Hadoop MapReduce. Hadoop comes with three types of schedulers namely FIFO, Fair and Capacity Scheduler. In some processing scenario these traditional scheduling algorithm of Hadoop cannot meet the performance requirements and fairness criteria of Big Data Processing. To address this issue new efficient scheduler is require who can identify the data size first and processed accordingly for performance improvement. This new MapReduce scheduling scheme Will improves MapReduce performance and erasure high speed data processing. Proposed system will analyze the data size of individual DataNode and create threads based on threshold value decided by proposed scheduler. Processing of the threads is done parallel on individual DataNode by task tracker which will ultimately improve the data process performance. Because of that task Tracker will does the work in less time than the time required by the traditional Scheduler.

References

Apache Hadoop. Available at http://hadoop. apache. org
ApacheHDFS. Available at http://hadoop. apache. org/hdfs
ApacheMapReduceAvailableathttp://hadoop. apache. org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial. html
Apachefairescheduler. Availableathttp://hadoop. apache. org/docs/r1. 2. 1/fair_scheduler. html
ApacheCapacityscheduler. Availableathttp://hadoop. apache. org/docs/r1. 2. 1/capacity_schedulerhtmlJournal of Computational Information Systems 7: 16 (2011) 5769-5775 Available at http://www. Jofcis. com "Research on Job Scheduling Algorithm in Hadoop" by Yang XIA, Lei WANG
A community white paper developed by leading researchers across the United States "Challenges and Opportunities with Big Data"
Jeffrey Dean and Sanjay Google, Inc. " MapReduce: Simplified Data Processing on Large Clusters"
KyuseokShimSeoulNationalUniversityshim@ee. snu. ac. kr "MapReduce Algorithms for Big Data Analysis"
Vasiliki Kalavri, Vladimir VlassovKTH The Royal Institute of Technology Stockholm, Sweden kalavri@kth. se "MapReduce: Limitations, Optimizations and Open Issues". TrustCom/ISPA/IUCC,Page1031-1038,IEEE,(2013)
Yi Yao, Jianzhe Tai, Bo Sheng, Ningfang Mi, "LsPS: A Job Size-Based Scheduler for Efficient Task Assignments in Hadoop", In proceedings of the IEEE transaction, Copyright (c) 2014 IEEE
Qutaibah Althebyan , Omar ALQudah, Yaser Jararweh Qussai Yaseen "Multi-Threading Based Map Reduce Tasks Scheduling", 2014 5th International Conference on Information and Communication Systems (ICICS)
Jisha S Manjaly, Varghese S Chooralil Department "TaskTracker Aware Scheduling for Hadoop MapReduce" 2014 5th International Conference on Information and Communication Systems (ICICS)
Runhui Li, Patrick P. C. Lee, Yuchong Hu "Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters" AoE/E-02/08 and ECS CUHK419212 from the University Grants Committee of Hong Kong, IEEE 2013
Bin Ye, Xiaoshe Dong, Pengfei Zheng "A delay scheduling algorithm based on history time in heterogeneous environments" 2013 8th Annual ChinaGrid Conference
S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, "An analysisof traces from a production mapreduce cluster," in CCGRID'10,2010, pp. 94–103.

Index Terms

Computer Science

Information Sciences

Keywords

Mapreduce Big Data Scheduling Hdfs