Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Data Duplication Tactics with Hadoop

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2019
Waqas Ahmad, Hongwei Xie, Ammad Khan, Mubashir Tariq

Waqas Ahmad, Hongwei Xie, Ammad Khan and Mubashir Tariq. Data Duplication Tactics with Hadoop. International Journal of Computer Applications 177(15):6-12, November 2019. BibTeX

	author = {Waqas Ahmad and Hongwei Xie and Ammad Khan and Mubashir Tariq},
	title = {Data Duplication Tactics with Hadoop},
	journal = {International Journal of Computer Applications},
	issue_date = {November 2019},
	volume = {177},
	number = {15},
	month = {Nov},
	year = {2019},
	issn = {0975-8887},
	pages = {6-12},
	numpages = {7},
	url = {},
	doi = {10.5120/ijca2019919548},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Hadoop Distributed File System (HDFS) allotment of Apache Hadoop helps in conveyed accommodation of huge devices with an accumulation of account equipment. HDFS guarantees accessibility of advice by accompanying advice to assorted hubs. Be that as it may, the archetype action of HDFS does not anticipate about the ballyhoo of information. The prevalence of the abstracts trend to change afterwards some time. Thus, befitting up a acclimatized archetype agency will access the accommodation capability of HDFS. In this cardboard we adduce an accomplished activating advice archetype administering framework, which accede the beyond of abstracts put abroad in HDFS afore replication. This alignment effectively characterizes the anal to hot advice or air-conditioned advice in appearance of its bulge and builds the reproduction of hot advice by applying abolishment coding for icy information. The balloon comes about authenticate that the proposed address viably decreases the accommodation acceptance up to 40% after influencing the accessibility and adjustment to centralized abortion in HDFS.


  1. Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on (pp. 1-10). IEEE.
  2. Wei, Q., Veeravalli, B., Gong, B., Zeng, L., & Feng, D. (2010, September). CDRM: A cost-effective dynamic replication management scheme for cloud storage cluster. In Cluster Computing (CLUSTER) 2010 IEEE International Conference on (pp. 188-196). IEEE.
  3. Ananthanarayanan, G., Agarwal, S., Kandula, S., Greenberg, A., Stoica, I., Harlan, D., & Harris, E. (2011, April). Scarlett: coping with skewed content popularity in map reduce clusters. In Proceedings of the sixth conference on Computer systems (pp. 287-300). ACM
  4. Abad, C. L., Lu, Y., & Campbell, R. H. (2011, September). DARE: Adaptive data replication for efficient cluster scheduling. In Cluster Computing (CLUSTER), 2011 IEEE International Conference on (pp. 159-168). IEEE
  5. Kaushik, R. T., Abdelzaher, T., Egashira, R., & Nahrstedt, K. (2011, July). Predictive data and energy management in Green HDFS. In Green Computing Conference and Workshops (IGCC), 2011 International (pp. 1-9). IEEE.
  6. “Bsoul, M., Al-Khasawneh, A., Abdullah, E. E., & Kilani, Y. (2011). Enhanced fast spread replication strategy for data grid. Journal of Network and Computer Applications, 34(2), 575- 580.
  7. Cheng, Z., Luan, Z., Meng, Y., Xu, Y., Qian, D., Roy, A., & Guan, G. (2012, September). Erms: An elastic replication management system for hdfs. In Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on (pp. 32-40). IEEE.
  8. Kousiouris, G., Vafiadis, G., & Varvarigou, T. (2013, October). Enabling proactive data management in virtualized hadoop clusters based on predicted data activity patterns. In P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on (pp. 1-8). IEEE.
  9. Papoulis, A. (1977). Signal analysis (Vol. 191). New York: McGraw-Hill.
  10. Bui, D. M., Hussain, S., Huh, E. N., & Lee, S. (2016). Adaptive Replication Management in HDFS based on Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 28(6), 1369-1382.
  11. Qu, K., Meng, L., & Yang, Y. (2016, August). A dynamic replica strategy based on Markov model for hadoop distributed file system (HDFS). In Cloud Computing and Intelligence Systems (CCIS), 2016 4th International Conference on (pp. 337-342). IEEE.
  12. Reed, Irving S.; Solomon, Gustave (1960), Polynomial Codes over Certain Finite Fields, Journal of the Society for Industrial and Applied Mathematics (SIAM), 8 (2): 300–304, doi:10.1137/0108018
  13. J. Dean and S. Ghemawat, “Map Reduce: simplified data processing on large clusters,” in Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation (OSDI ’04), pp. 137–149, San Francisco, Calif, USA, 2004.


Big Data, Hadoop Distributed File System, Dynamic data replication