CFP last date
22 April 2024
Reseach Article

Workload Aware Replicated Datapartitioning for Twitter

by Shanty S.R., Aby Abahai T., Eldo P. Elias
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 130 - Number 4
Year of Publication: 2015
Authors: Shanty S.R., Aby Abahai T., Eldo P. Elias
10.5120/ijca2015906845

Shanty S.R., Aby Abahai T., Eldo P. Elias . Workload Aware Replicated Datapartitioning for Twitter. International Journal of Computer Applications. 130, 4 ( November 2015), 21-28. DOI=10.5120/ijca2015906845

@article{ 10.5120/ijca2015906845,
author = { Shanty S.R., Aby Abahai T., Eldo P. Elias },
title = { Workload Aware Replicated Datapartitioning for Twitter },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 130 },
number = { 4 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 21-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume130/number4/23198-2015906845/ },
doi = { 10.5120/ijca2015906845 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:24:28.418116+05:30
%A Shanty S.R.
%A Aby Abahai T.
%A Eldo P. Elias
%T Workload Aware Replicated Datapartitioning for Twitter
%J International Journal of Computer Applications
%@ 0975-8887
%V 130
%N 4
%P 21-28
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Most of the queries in twitter include multiuser operations. When a user login to twitter it requests the most recent tweets of whom he follows. These data may be present in different servers. The expense of these queries depends on how the data is partitioned. Existing solution for data partitioning involve hash or graph based partition. In this paper a new method for reducing the interaction between the servers are proposed. For this the data is partitioned such that most of the users that a user interacts are placed on the same partition. In addition to data partition selective replication is also implemented in the proposed approach. The data about the users that are requested most are replicated more than the other users. Experimental analysis indicates that the proposed technique provides significant improvements in the quality of the partitions, especially under low replication ratios.

References
  1. Ata Turk, R. Oguz Selvitopi, Hakan Ferhatosmanoglu, and Cevdet Aykanat, “Temporal Workload-Aware Replicated Partitioning for Social Networks,” IEEE transactions on knowledge and data engineering, vol. 26, no. 11, november 2014
  2. R. Hecht and S. Jablonski, “NoSQL Evaluation: A Use Case Oriented Survey,” Proc. Int’l Conf. Cloud and Service Computing (CSC),pp. 336-341, Dec. 2011.
  3. J.M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez, “Scaling Online Social Networks Without Pains,” Proc. Fifth Int’l Workshop Networking Meets Databases (NeTDB), 2009.
  4. M. Yuan, D. Stein, B. Carrasco, J.M. F. da Trindade, and Y. Lu,“Partitioning Social Networks for Fast Retrieval of Time-Dependent Queries,” Proc. IEEE 28th Int’l Conf. Data Eng. Workshop. t10.1145/2213836.2213895, 2012.
  5. J.M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez, “The Little Engine (s) that Could: Scaling Online Social Networks,” ACM SIGCOMM Computer Comm. Rev., vol. 40, no. 4, pp. 375-386, 2010.
  6. C.Curino, E. Jones, Y. Zhang, and S. Madden, “Schism: A Workload-Driven Approach to Database Replication and Partitioning,” Proc. VLDB Endowment, vol. 3, no. 1-2, pp. 48-57.
  7. A. Lakshman and P. Malik, “Cassandra: A Decentralized Structure Storage System,” SIGOPS Operating System Rev., vol. 44, no. 2, pp. 35-40,Apr. 2010.
  8. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazons Highly Available Key-Value Store.,” Proc. 21st ACM SIGOPS Symp. Operating Systems Principles, pp. 205-220, 2007.
  9. O.R.M. Thomae, “Database Partitioning Strategies for Social Network Data,” master’s thesis, Massachusetts Inst. of Technology, 2012.
  10. G. Karypis and V. Kumar, “Multilevel k-Way Hypergraph Partitioning,” Proc. ACM/IEEE 36th Ann. Design Automation Conf. pp. 343-348, 1999.
  11. U.V. Atalyurek and C. Aykanat, “PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0,” technical report, Dept. of Computer Eng., Bilkent Univ., 1999.
  12. U. Catalyurek and C. Aykanat, “Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication,” IEEE Trans. Parallel and Distributed System, vol. 10, no. 7, pp. 673- 693,2010.
  13. R.O. Selvitopi, A. Turk, and C. Aykanat, “Replicated Partitioning for Undirected Hypergraphs,” J. Parallel and Distributed Computing, vol. 72, no. 4, pp. 547-563. j.jpdc.2012.01.004, Apr. 2012.
  14. D.S. Johnson, “Approximation Algorithms for Combinatorial Problems,” Proc. ACM Fifth Ann.Symp. Theory of Computing (STOC ’73), pp. 38-49.
  15. Y. Qiu-yan, “A Novel Time Streams Prediction Approach Based on Exponential Smoothing,” Proc. Second Int’l Conf. MultiMedia and Information Technology (MMIT ’10), pp. 20-23, 2010.
  16. M. De Choudhury, Y.-R. Lin, H. Sundaram, K.S. Candan, L. Xie, and A. Kelliher, “How Does the Data Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?” Proc. Fourth Int’l AAAI Conf. Weblogs and Social Media, 2010.
  17. G. Karypis and V. Kumar, “Metis—Unstructured Graph Partitioning and Sparse Matrix Ordering System, Version 2.0,” technical report, Dept. of Computer Science and Eng., Univ. of Minnesota, 1995.
  18. A. Tatarowicz, C. Curino, E. Jones, and S. Madden, “Lookup Tables: Fine-Grained Partitioning for Distributed Databases,” Proc. IEEE 28th Int’l Conf. Data Eng. (ICDE), pp. 102-113, Apr. 2012.
  19. A. Silberstein, J. Terrace, B.F. Cooper, and R. Ramakrishnan, “Feeding Frenzy: Selectively Materializing Users’ Event Feeds,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 831-842, 2010.
  20. Y. Huang, Q. Deng, and Y. Zhu, “Differentiating Your Friends for Scaling Online Social Networks,” Proc. IEEE Int’l Conf. ClusterComputing (CLUSTER), pp. 411-419, Sept. 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Data partitioning Selective replication Social network Twitter Cassandra.