CFP last date
22 April 2024
Reseach Article

A Simplified Analytical Model Toward Big Data Analysis using Ridge Regression Method

by Afreen Ali, Sarwesh Site
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 28
Year of Publication: 2018
Authors: Afreen Ali, Sarwesh Site
10.5120/ijca2018916665

Afreen Ali, Sarwesh Site . A Simplified Analytical Model Toward Big Data Analysis using Ridge Regression Method. International Journal of Computer Applications. 180, 28 ( Mar 2018), 41-48. DOI=10.5120/ijca2018916665

@article{ 10.5120/ijca2018916665,
author = { Afreen Ali, Sarwesh Site },
title = { A Simplified Analytical Model Toward Big Data Analysis using Ridge Regression Method },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2018 },
volume = { 180 },
number = { 28 },
month = { Mar },
year = { 2018 },
issn = { 0975-8887 },
pages = { 41-48 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number28/29185-2018916665/ },
doi = { 10.5120/ijca2018916665 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:02:06.299038+05:30
%A Afreen Ali
%A Sarwesh Site
%T A Simplified Analytical Model Toward Big Data Analysis using Ridge Regression Method
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 28
%P 41-48
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Ridge Regression is a essential method in linear regression used to analyze multiple regression data which have multicollinearity. For solving highly related multicollinearity problems, Ridge Regression is a better modeling technique than ordinary least square method. The analytical data in modern technology is becoming extremely large in size and the term which describes this large volume of data is “Big Data”, and ordinary tools are insufficient to analyzes big data. In this paper, we are presenting an approach toward big data analysis through ridge regression method. Our simulation result represents a mapping model of Gaussian data from big data in sufficient scale. This model presents the new gateway for big data for statistical and mathematical analysis.

References
  1. A. J. Bush, “Ridge: A program to perform ridge regression analysis,” Behavior Research Methods & Instrumentation, vol. 12, no. 1, pp. 73–74, Jan 1980.
  2. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.
  3. A. Gepp, M. K. Linnenluecke, T. J. ONeill, and T. Smith, “Big data techniques in auditing research and practice: Current trends and future opportunities,” Journal of Accounting Literature, vol. 40, pp. 102 – 115, 2018.
  4. Q. Gao and T. C. Lee, “High-dimensional variable selection in regression and classification with missing data,” Signal Processing, vol. 131, pp. 1 – 7, 2017.
  5. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Ann. Statist., no. 2, pp. 407–499, 04.
  6. J. S. Vitter, “Algorithms and data structures for external memory.” Found. Trends Theor. Comput. Sci., vol. 2, no. 4, pp. 305–474, 2006.
  7. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” pp. 137–150, 01 2004.
  8. H. Karloff, S. Suri, and S. Vassilvitskii, “A model of computation for mapreduce,” in Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’10, 2010, pp. 938–948.
  9. M. R. Thakare, S. W. Mohod, , and A. N. Thakare, “Various data-mining techniques for big data,” IJCA Proceedings on International Conference on Quality Up-gradation in Engineering, Science and Technology, vol. ICQUEST 2015, no. 8, pp. 9–13, October 2015, full text available.
  10. M. Enea, “Fitting linear models and generalized linear models with large data sets in r,” Statistical Methods for the Analysis of Large Datasets: book of short papers, pp. 411–414, 2009.
  11. J. Polo, D. Carrera, Y. Becerra, M. Steinder, and I. Whalley, “Performance-driven task co-scheduling for mapreduce environments,” in 2010 IEEE Network Operations and Management Symposium - NOMS 2010, April 2010, pp. 373–380.
  12. A. Fernndez, S. del Ro, V. Lpez, A. Bawakid, M. J. del Jesus, J. M. Bentez, and F. Herrera, “Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 4, no. 5.
  13. D. Miner and A. Shook, MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems, 1st ed. O’Reilly Media, Inc., 2012.
  14. P. Ma and X. Sun, “Leveraging for big data regression,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 7, no. 1.
  15. S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, and W. S. Cleveland, “Large complex data: divide and recombine (d and r) with rhipe,” Stat, vol. 1, no. 1.
  16. N. Lin and R. Xi, “Aggregated estimating equation estimation.” Stat. Interface, vol. 4, no. 1, pp. 73–83, 2011.
  17. B. Gupta, A. Rawat, A. Jain, A. Arora, and N. Dhami, “Analysis of various decision tree algorithms for classification in data mining,” International Journal of Computer Applications, vol. 163, no. 8, pp. 15–19, Apr 2017.
  18. R. V. Hogg and A. T. Craig, Introduction to mathematical statistics.(5”” edition). Upper Saddle River, New Jersey: Prentice Hall, 1995.
  19. S. Jun and S.-J. L.-B. Ryu, “A divided regression analysis for big data,” International Journal of Software Engineering and Its Applications, vol. 9, no. 5, 2015.
  20. L. R. Nair and S. D. Shetty, “Research in big data and analytics: An overview,” International Journal of Computer Applications, vol. 108, no. 14, pp. 19–23, December 2014, full text available.
  21. B.-W. Chen, S. Rho, L. T. Yang, and Y. Gu, “Privacypreserved big data analysis based on asymmetric imputation kernels and multiside similarities,” Future Generation Computer Systems, vol. 78, no. Part 2, pp. 859 – 866, 2018.
  22. P. Goldar, Y. Rai, and S. Kushwaha, “A review on parallelization of big data analysis and processing,” International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE), vol. 23, no. 4, pp. 60–65, August 2016.
  23. H. Geng, HADOOP TECHNOLOGY. Wiley Telecom, 2017, pp. 816–. D. Vohra, Using Apache Hadoop. Berkeley, CA: Apress, 2016, pp. 117–130.
  24. W. Q. Meeker and Y. Hong, “Reliability meets big data: Opportunities and challenges,” Quality Engineering, vol. 26, no. 1, pp. 102–116, 2014.
  25. Y. Mao and W. Min, “Storage and accessing small files based on hdfs,” in Proceedings of International Conference on Computer Science and Information Technology, S. Patnaik and X. Li, Eds. New Delhi: Springer India, 2014, pp. 565–573.
  26. S. G. Edward and N. Sabharwal, Introducing MongoDB. Berkeley, CA: Apress, 2015, pp. 25–28.
  27. L. Vokorokos, M. Uchnr, and A. Bal, “Mongodb scheme analysis,” in 2017 IEEE 21st International Conference on Intelligent Engineering Systems (INES), Oct 2017, pp. 000 067– 000 070.
  28. D. Vohra, Using Apache Spark. Berkeley, CA: Apress, 2016, pp. 219–228.
  29. T. Zhang and B. Yang, “An exact approach to ridge regression for big data,” Computational Statistics, vol. 32, no. 3, pp. 909–928, Sep 2017.
  30. J. Haworth, J. Shawe-Taylor, T. Cheng, and J. Wang, “Local online kernel ridge regression for forecasting of urban travel times,” Transportation Research Part C: Emerging Technologies, vol. 46, no. Supplement C, pp. 151 – 178, 2014.
  31. H. Xue, Y. Zhu, and S. Chen, “Local ridge regression for face recognition,” Neurocomputing, vol. 72, no. 4, pp. 1342 – 1346, 2009, brain Inspired Cognitive Systems (BICS 2006) / Interplay Between Natural and Artificial Computation (IWINAC 2007).
  32. A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
  33. H. Zhan and S. Xu, “Adaptive ridge regression for rare variant detection,” PLOS ONE, vol. 7, 08 2012.
  34. R. L. Obenchain, “Classical f-tests and confidence regions for ridge regression,” Technometrics, vol. 19, no. 4, pp. 429–439, 1977.
  35. J. Fan, F. Han, and H. Liu, “Challenges of big data analysis,” National Science Review, vol. 1, no. 2, pp. 293–314, 2014.
  36. X. Shen, M. Alam, F. Fikse, and L. R¨onneg°ard, “A novel generalized ridge regression method for quantitative genetics,” Genetics.
  37. Q.-T. Zhang, Y. Liu, W. Zhou, and Z.-W. Yang, “A sequential regression model for big data with attributive explanatory variables,” Journal of the Operations Research Society of China, vol. 3, no. 4, p. 475.
  38. J. H. Friedman, “Data mining and statistics: What’s the connection?” Computing Science and Statistics, vol. 29, no. 1, pp. 3–9, 1998.
  39. Y. Benjamini and M. Leshno, “Statistical methods for data mining,” in Data Mining and Knowledge Discovery Handbook. Springer, 2005, pp. 565–587.
  40. F. Z. Maksood and G. Achuthan, “Analysis of data mining techniques and its applications,” International Journal of Computer Applications, vol. 140, no. 3, pp. 6–14, April 2016, published by Foundation of Computer Science (FCS), NY, USA.
  41. A. Mohammadighavam, N. Rajabpour, and A. Naserasadi, “A survey on data mining approaches,” International Journal of Computer Applications, vol. 36, no. 6, pp. 14–18, December 2011, full text available.
  42. J. W. Emerson and M. J. Kane, “Don’t drown in the data,” Significance, vol. 9, no. 4, pp. 38–39, 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Big Data MapReduce Statistics Regression Model Gaussian Data