CFP last date
22 April 2024
Reseach Article

Hadoop.TS: Large-Scale Time-Series Processing

by Mirko K¨ampf, Jan W. Kantelhardt
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 17
Year of Publication: 2013
Authors: Mirko K¨ampf, Jan W. Kantelhardt
10.5120/12974-0233

Mirko K¨ampf, Jan W. Kantelhardt . Hadoop.TS: Large-Scale Time-Series Processing. International Journal of Computer Applications. 74, 17 ( July 2013), 1-8. DOI=10.5120/12974-0233

@article{ 10.5120/12974-0233,
author = { Mirko K¨ampf, Jan W. Kantelhardt },
title = { Hadoop.TS: Large-Scale Time-Series Processing },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 17 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume74/number17/12974-0233/ },
doi = { 10.5120/12974-0233 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:42:31.194358+05:30
%A Mirko K¨ampf
%A Jan W. Kantelhardt
%T Hadoop.TS: Large-Scale Time-Series Processing
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 17
%P 1-8
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The paper describes a computational framework for time-series analysis. It allows rapid prototyping of new algorithms, since all components are re-usable. Generic data structures represent different types of time series, e. g. event and interevent time series, and define reliable interfaces to existing big data. Standalone applications, highly scalable MapReduce programs, and User Defined Functions for Hadoop-based analysis frameworks are the major modes of operation. Efficient implementations of univariate and bivariate analysis algorithms are provided for, e. g. , long-term correlation, crosscorrelation and event synchronization analysis on large data sets.

References
  1. R. H. Shumway. D. S. Stoffer, "Time series analysis and its applications: with R examples," 3rd ed. , Springer, 2013.
  2. "Encyclopedia of complexity and systems science," ed. R. Meyers, Springer, 2009.
  3. M. Small, "Applied nonlinear time series analysis: applications in physics, medicine and economics," World Scientific, 2005.
  4. H. Kantz, T. Schreiber, "Nonlinear time series analysis," Cambridge University Press, 2003.
  5. Public data sets on AWS, http://aws. amazon. com/datasets.
  6. Kaggle, http://www. kaggle. com.
  7. Apache Hadoop, http://hadoop. apache. org/.
  8. K. Shvachko et al. , "The Hadoop distributed file system," in: Proc. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), (Washington DC, USA), pp. 1– 10, IEEE Computer Society, 2010.
  9. Hive, http://hive. apache. org/.
  10. PIG, http://pig. apache. org/.
  11. SOCIONICAL, http://www. socionical. eu/.
  12. M. K¨ampf et al. , "Burst event and return interval statistics in Wikipedia access and edit data," Physica A, vol. 391, pp. 9101–9111, 2012.
  13. C. K. Peng et al. , "Mosaic organization of DNA nucleotides," Phys. Rev. E, vol. 49, pp. 1685-1689, 1994.
  14. A. Bunde et al. , "Correlated and uncorrelated regions in heart-rate fluctuations during sleep," Phys. Rev. Lett. , vol. 85, pp. 3736–3739, 2000.
  15. J. W. Kantelhardt et al. , "Detecting long-range correlations with detrended fluctuation analysis," Physica A, vol. 295, pp. 441–454, 2001.
  16. A. Bashan et al. , "Comparison of detrending methods for fluctuation analysis," Physica A, vol. 387, pp. 5080–5090, 2008.
  17. J. W. Kantelhardt et al. , "Multifractal detrended fluctuation analysis of nonstationary time series," Physica A, vol. 316, pp. 87–114, 2002.
  18. J. Ludescher et al. , "On the spurious multifractality in long-term correlated records: The effect of additive shortterm memory, periodicities and noise," Physica A, vol. 390, pp. 2480–2490, 2011.
  19. A. Y. Schumann, J. W. Kantelhardt, "Multifractal moving average analysis and test of multifractal model with tuned correlations," Physica A, vol. 390, pp. 2637–2654, 2011.
  20. A. Bunde et al. , "The effect of long-term correlations on the statistics of rare events," Physica A, vol. 330, pp. 1–7, 2003.
  21. A. Bunde et al. , "Long-term memory: A natural mechanism for the clustering of extreme events and anomalous residual times in climate records," Phys. Rev. Lett. , vol. 94, p. 048701, 2005.
  22. E. G. Altmann, H. Kantz, "Recurrence time analysis, long-term correlations, and extreme events," Phys. Rev. E, vol. 71, p. 056106, 2005.
  23. J. F. Eichner et al. , "Statistics of return intervals in longterm correlated records," Phys. Rev. E, vol. 75, p. 011128, 2007.
  24. J. W. Kantelhardt, "Fractal and multifractal time series," in
  25. .
  26. R. Q. Quiroga et al. , "Event synchronization: A simple and fast method to measure synchronicity and time delay patterns," Phys. Rev. E, vol. 66, p. 041904, 2002.
  27. M. K¨ampf et al. , "From time series to co-evolving networks: Dynamics of the complex system Wikipedia," Proc. Europ. Conf. Complex Systems (ECCS), Brussels 2012.
  28. A. Bashan et al. , "Network physiology reveals relations between network topology and physiological function," Nature Commun. , vol. 3, p. 702, 2012.
  29. J. W. Kantelhardt et al. , "Transitions in traffic scaling and cross-correlation behavior," submitted to Physica A, 2013.
  30. Apache Mahout, http://mahout. apache. org.
  31. Apache Giraph, http://incubator. apache. org/giraph/.
  32. L. G. Valiant, "A bridging model for parallel computation," Commun. ACM, vol. 33, pp. 103–111, 1990.
  33. G. Malewicz et al. , "Pregel: a system for large-scale graph processing" in: Proc. 28th ACM symp. Principles of distributed computing (PODC) (New York), p. 6, 2009.
  34. Apache Commons Math, http://commons. apache. org/math/.
  35. P. Pebay, "Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments," SANDIA report, Sept. 2008.
  36. D. Miner and A. Shook, "MapReduce Design Patterns," O'Reily, 2012.
  37. Piggybank, https://cwiki. apache. org/pig/piggybank. html.
  38. SQLWindowing, http://github. com/hbutani/sqlwindowing.
  39. Y. Wu et al. , "Evidence for a bimodal distribution in human communication," Proc. Natl. Acad. Sci. , vol. 107, p. 18803, 2010.
  40. P. Ch. Ivanov et al. , "Common scaling patterns in intertrade times of US stocks," Phys. Rev. E, vol. 69, p. 056107, 2009.
  41. B. Chiu et al. , "Probabilistic discovery of time series motifs," SIGKDD (Washington DC, USA), 2003.
  42. M. K¨ampf, "Datameer: Smart processing for big data," Javamagazin, pp. 40–48, July 2012.
  43. M. K¨ampf, "Time-Series based reconstruction and analysis of complex networks," PhD dissertation, Institute of Physics, Martin-Luther University Halle-Wittenberg, Germany, 2013.
  44. Hadoop. TS – source code repository on Github, https://github. com/kamir/Hadoop. TS.
Index Terms

Computer Science
Information Sciences

Keywords

Time Series Analysis Detrended Fluctuation Analysis Return Interval Statistics Cross Correlation Event Synchronization Hadoop MapReduce