CFP last date
20 May 2024
Reseach Article

Predicting Survival on Titanic by Applying Exploratory Data Analytics and Machine Learning Techniques

by Yogesh Kakde, Shefali Agrawal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 44
Year of Publication: 2018
Authors: Yogesh Kakde, Shefali Agrawal
10.5120/ijca2018917094

Yogesh Kakde, Shefali Agrawal . Predicting Survival on Titanic by Applying Exploratory Data Analytics and Machine Learning Techniques. International Journal of Computer Applications. 179, 44 ( May 2018), 32-38. DOI=10.5120/ijca2018917094

@article{ 10.5120/ijca2018917094,
author = { Yogesh Kakde, Shefali Agrawal },
title = { Predicting Survival on Titanic by Applying Exploratory Data Analytics and Machine Learning Techniques },
journal = { International Journal of Computer Applications },
issue_date = { May 2018 },
volume = { 179 },
number = { 44 },
month = { May },
year = { 2018 },
issn = { 0975-8887 },
pages = { 32-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number44/29430-2018917094/ },
doi = { 10.5120/ijca2018917094 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:58:22.697615+05:30
%A Yogesh Kakde
%A Shefali Agrawal
%T Predicting Survival on Titanic by Applying Exploratory Data Analytics and Machine Learning Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 44
%P 32-38
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The sinking of the RMS Titanic caused the death of thousands of passengers and crew is one of the deadliest maritime disasters in history. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. The interesting observation which comes out from the sinking is that some people were more likely to survive than others, like women, children were the one who got the priority to rescue. The objective is to first explore hidden or previously unknown information by applying exploratory data analytics on available dataset and then apply different machine learning models to complete the analysis of what sorts of people were likely to survive. After this the results of applying machine learning models are compared and analyzed on the basis of accuracy.

References
  1. Analyzing Titanic disaster using machine learning algorithms-Computing, Communication and Automation (ICCCA), 2017 International Conference on 21 December 2017, IEEE.
  2. Eric Lam, Chongxuan Tang, "Titanic Machine Learning From Disaster", LamTang-Titanic Machine Learning From Disaster, 2012.
  3. S. Cicoria, J. Sherlock, M. Muniswamaiah, L. Clarke, "Classification of Titanic Passenger Data and Chances of Surviving the Disaster", Proceedings of Student-Faculty Research Day CSIS, pp. 1-6, May 2014.
  4. Corinna Cortes, Vlasdimir Vapnik, “Support-vector networks”, Machine Learning, Volume 20, Issue 3,pp 273-297.
  5. L Breman- “random forests”, Machine Learning, 2001 Ng. CS229 Notes, Standford University, 2012.
  6. SJ Russsel P Norvig-“Artificial intelligence: A modern approach”-2016.
  7. Lonnie Stevans, David L. Gleicher, ”Who Survived the Titanic? A logistic regression analysis”-Article in International Journal of Maritime History, December 2004.
  8. MICHAEL AARON WHITLEY, Using statistical learning to predict survival of passengers on the RMS Titanic by Michael Aaron Whitley, 2015.
  9. Kunal Vyas, Zeshi Zheng, Lin Li, Titanic- Machine Learning From Disaster- 2015.
  10. EECS 349 Titanic- Machine Learning From Disaster, Xiaodong Yang, Northwestern University.
  11. Prediction of Survivors in Titanic Dataset: A Comparitive Study using Machine Learning Algorithms, Tryambak Chatterlee, IJERMT-2017.
  12. An Introduction to Logistic Regression Analysis and Reporting by Chao-Yig Joanne Peng, Kuk Lida Lee & Gary M. Ingersoll, April 2010.
  13. Zhenyan Liu, Yifei Zeng, Yida Yan, Pengfei Zhang and Yong Wang, Machine Learning for Analyzing Malware, Journal of Cyber Security and Mobility, Vol: 6 Issue: 3, July 2017.
  14. Andy Liaw and Metthew Wiener, Classification and Regression by Random Forest, vol. 2/3, December 2002.
  15. Galit Shmueli and Otto R. Koppius MIS Quarterly, Predictive Analytics in Information System Research, , Vol. 35, No. 3(September 2011), pp. 553-572.
  16. john D. Kelleher, Brain Mac Namee, Aoife D’Arcy Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms .
  17. Dr. Neeraj Bhargava, Girja Sharma, Decision Tree Analysis on J48 Algorithm for Data Mining. Volume 3, Issue 6, June 2013.
  18. Data Mining: Practical Machine Learning Tools and Techniques, by Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal.
  19. A Comparison of Goodness of Fit Tests for the Logistic Regression Model, D.W. Hosmer, T. Hosmer, S. Le Cessie and S. Lemeshow
  20. Breiman, L. 2001a. Random forests. Machine Learning 45:5-32.
  21. Stuart J. Russell, Peter Norvig, Artificial Intelligence: A Modern Approach, Pearson Education, 2003, pg 697-702.
  22. Cortes, Corinna; and Vapnik, Vladimir N.; "SupportVector Networks", Machine Learning, 20, 1995.
  23. Unwin A, Hofmann H (1999). \GUI and Command-line { Conict or Synergy?" In K Berk,M Pourahmadi (eds.), Computing Science and Statistics.
  24. Machine Learning Benchmarks and Random Forest Regression, Segal, Mark R, 2004.
  25. Proceedings of Student-Faculty Research Day, CSIS, Pace University, May 2nd, 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining ggplot Logistic Regression Random Forest Feature Engineering Support Vector Machine Confusion Matrix.