Call for Paper - July 2020 Edition
IJCA solicits original research papers for the July 2020 Edition. Last date of manuscript submission is June 22, 2020. Read More

Improved Input Data Splitting in MapReduce

IJCA Proceedings on International Conference on Emerging Trends in Technology and Applied Sciences
© 2015 by IJCA Journal
ICETTAS 2015 - Number 2
Year of Publication: 2015
Reema Rhine
Nikhila T Bhuvan

Reema Rhine and Nikhila T Bhuvan. Article: Improved Input Data Splitting in MapReduce. IJCA Proceedings on International Conference on Emerging Trends in Technology and Applied Sciences ICETTAS 2015(2):23-26, September 2015. Full text available. BibTeX

	author = {Reema Rhine and Nikhila T Bhuvan},
	title = {Article: Improved Input Data Splitting in MapReduce},
	journal = {IJCA Proceedings on International Conference on Emerging Trends in Technology and Applied Sciences},
	year = {2015},
	volume = {ICETTAS 2015},
	number = {2},
	pages = {23-26},
	month = {September},
	note = {Full text available}


The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.


  • J. Tan, S. Meng, X. Meng, et al. , "Improving ReduceTask data locality for sequential MapReduce jobs," in INFOCOM, 2013 Proceedings IEEE, 2013, pp. 1627-1635
  • R. Vernica, A. Balmin, K. S. Beyer, et al. , "Adaptive MapReduce using situation-aware mappers," in Proceedings of the 15th International Conference on Extending Database Technology, 2012, pp. 420-431.
  • A. Rasmussen, M. Conley, G. Porter, et al. , "Themis: an I/O-efficient MapReduce," in Proceedings of the Third ACM Symposium on Cloud Computing, 2012, p. 13.
  • S. Ibrahim, H. Jin, L. Lu, et al. , "Maestro: Replica-aware map scheduling for mapreduce," in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on, 2012, pp. 435-442.
  • M. Hammoud and M. F. Sakr, "Locality-aware reduce task scheduling for mapreduce," in Cloud Computing Technology and Science (Cloud- Com), 2011 IEEE Third International Conference on, 2011, pp. 570- 576.
  • T. Condie, N. Conway, P. Alvaro, et al. , "Online aggregation and continuous query support in mapreduce," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 2010, pp. 1115-1118.
  • J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51,pp. 107-113,2008.
  • H. -c. Yang, A. Dasdan, R. -L. Hsiao, et al. , "Map-reduce-merge:simplified relational data processing on large clusters," in Proceedings of the 2007 ACM SIGMOD international conference on Management of data, 2007, pp. 1029-1040.
  • Hadoop is released as source code tarballs with corresponding binary tarballs for convenience http://hadoop. apache. org/
  • https://www. mapr. com/blog/understanding-mapreduce-input-split-sizes-and-mapr-fs-chunk-sizes#. VQcuCfmUegI
  • http://dailyhadoopsoup. blogspot. in/2014/02/mapreduce-inputs-and-splitting. html
  • The paperwork for opening a business or getting unemployment http://www. openstack. org/
  • http://www. cloudera. com/content/cloudera/en/products-and-services/cdh/hdfs-and-mapreduce. html
  • http://www. revelytix. com/?q=content/hadoop-overview
  • MarkLogic Connector for Hadoop Developer's Guidehttp://docs. marklogic. com/hadoop:get-splits
  • http://grepcode. com/file/repository. cloudera. com/content/repositories/releases/com. cloudera. hadoop/hadoop-core/0. 20. 2737/org/apache/hadoop/mapreduce/lib/input/FileInputFormat. java
  • Chunguang Wang; Qingbo Wu; Yusong Tan; Wenzhu Wang; Quanyuan Wu, "Locality Based Data Partitioning in MapReduce," Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on , vol. , no. , pp. 1310,1317, 3-5 Dec. 2013