Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approach

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Umesh D. R., B. Ramachandra

Umesh D R. and B Ramachandra. Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approach. International Journal of Computer Applications 150(7):7-11, September 2016. BibTeX

	author = {Umesh D. R. and B. Ramachandra},
	title = {Big Data Analytics to Predict Breast Cancer Recurrence on SEER Dataset using MapReduce Approach},
	journal = {International Journal of Computer Applications},
	issue_date = {September 2016},
	volume = {150},
	number = {7},
	month = {Sep},
	year = {2016},
	issn = {0975-8887},
	pages = {7-11},
	numpages = {5},
	url = {},
	doi = {10.5120/ijca2016911549},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


The traditional data analytic might not have the capacity to handle enormous amount of data. Due to the rapid growth of information, solutions need to be contemplated and provided in order to handle and extract value and knowledge from these data sets. Moreover, decision makers should have the capacity to increase significant bits of knowledge from such fluctuated and quickly evolving information. Such esteem can be given utilizing big data analytic, which is the utilization of advanced analytic techniques on big data using MapReduce approach. This paper examines to develop a high performance platform to efficiently analyse big SEER (Surveillance, Epidemiology, and End Results) breast cancer data set using MapReduce to find the recurrence of breast cancer.


  1. Sagiroglu, S., and Sinanc, D., 2013. Big Data: A Review. International Conference on Collaboration Technologies and Systems (CTS), pp. 42-47.
  2. Zaslavsky, A., Perera, C., and Georgakopoulos, D., 2012. Sensing as a Service and Big Data. Proceedings of the International Conference on Advances in Cloud Computing (ACC), pp. 21-29.
  3. Suthaharan, S., 2014. Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning. ACM SIGMETRICS Performance Evaluation Review, 41(4), pp. 70-73.
  4. Kishor, D., 2013. Big Data: The New Challenges in Data Mining. International Journal of Innovative Research in Computer Science & Technology, 1(2), pp. 39-42.
  5. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
  6. White T (2012) Hadoop: The Definitive Guide. " O’Reilly Media, Inc.", California
  7. Venner J, Cyrus S (2009) Pro Hadoop. vol. 1. Springer, New York
  8. Lam C (2010) Hadoop in Action. Manning Publications Co., New York
  9. Chu C, Kim SK, Lin YA, Yu Y, Bradski G, Ng AY, Olukotun K (2007) Map-reduce for machine learning on multicore. Adv neural Info processing systems 19:281
  10. Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J ACM (JACM) 45(6):983–1006
  11. Panda B, Herbach JS, Basu S, Bayardo RJ (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endowment 2(2):1426–1437
  12. Liu, B., Blasch, E., Chen, Y., Shen, D., and Chen, G., 2013. Scalable Sentiment Classification for Big Data Analysis Using Naïve Bayes Classifier. IEEE International Conference on Big Data, pp. 99-104.
  13. Dai, W., and Ji, W., 2014. A MapReduce Implementation of C4.5 Decision Tree Algorithm. International Journal of Database Theory and Application, 7(1), pp. 49-60.
  14. Kiran, M., Kumar, A., Mukherjee, S., and Prakash, R., 2013. Verification and Validation of MapReduce Program Model for Parallel Support Vector Machine. International Journal of Computer Science Issues, 10(3), pp. 317-325.
  15. Han, J., Liu, Y., and Sun, X., 2013. A Scalable Random Forest Algorithm Based on MapReduce. 4th IEEE International Conference on Software Engineering and Service Science, pp. 849-852.
  16. Santi Wulan Purnami, S.P. Rahayu and Abdullah Embong, “Feature selection and classification of breast cancer diagnosis based on support vector machine”, IEEE 2008.
  17. Farzaneh Keivanfard , Mohammad Teshnehlab , Mahdi Aliyari Shoorehdeli , “Feature Selection and Classification of Breast Cancer on Dynamic Magnetic Resonance Imaging by Using Artificial Neural Networks”, Proceedings of the 17th Iranian Conference of Biomedical Engineering (ICBME2010), 3-4 November 2010.
  18. A. Lambrou, H. Papadopoulos, A. Gammerman, “Evolutionary Conformal Prediction for Breast Cancer diagnosis”, Proceedings of the 9th International Conference on Information Technology and Applications in Biomedicine, ITAB 2009, Larnaca, Cyprus, 5-7 November 2009.
  19. Liu Ya-Qin, Wang Cheng, Zhang Lu, “Decision tree based predictive models for breast cancer survivability on imbalanced data ”, IEEE 2009.
  20. Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, Alok Choudhary, “A Lung Cancer Mortality Risk Calculator Based on SEER Data”, IEEE 2011.
  21. D. Delen, G. Walker, A. Kadam, “Predicting breast cancer survivability: comparison of three data mining methods,” Artificial Intelligence in Medicine, vol. 34, pp. 113-127, 2005
  22. A.Bellachia and E.Guvan,“Predicting breast cancer survivability using data mining techniques”, Scientific Data Mining Workshop, in conjunction with the 2006 SIAM Conference on Data Mining, 2006
  23. Umesh D R and B Ramachandra, “Association Rule Mining Based Predicting Breast Cancer recurrence on SEER Breast Cancer Data” IEEE 2015
  24. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA. pp 135–146
  25. Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. J R Stat Soc Series B 39:1-38.
  26. Erika Laranjeira & Filipe Grilo, The impact of innovation on Healthcare costs: A multiple imputation approach. 2nd Portuguese Stata User Group Meeting Olisipo.
  28. Umesh D R completed his Engineering from PES College of Engineering Mandya, Masters from NIE Mysore, presently pursuing Ph.D. from University of Mysore, Mysore. Working in PES College of Engineering Mandya from 2005.
  29. Dr.B.Ramachandra working as Professor and Head in Department of Electrical & Electronics, PES College of Engineering Mandya. He had his Ph.D. From Indian Institute of Science, Bangalore, Master’s from Indian Institute of Technology, Bombay.


Breast cancer; Big data, Classification; Data analytics, MapReduce, SEER.