Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Data Integration for Books Data using Graph Database

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Darshana Shimpi

Darshana Shimpi. Data Integration for Books Data using Graph Database. International Journal of Computer Applications 161(8):43-47, March 2017. BibTeX

	author = {Darshana Shimpi},
	title = {Data Integration for Books Data using Graph Database},
	journal = {International Journal of Computer Applications},
	issue_date = {March 2017},
	volume = {161},
	number = {8},
	month = {Mar},
	year = {2017},
	issn = {0975-8887},
	pages = {43-47},
	numpages = {5},
	url = {},
	doi = {10.5120/ijca2017913257},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


The modern developments through the internet have the opportunities to access heterogeneous information system any where in the world. We can integrate information which is structured or unstructured and deliver it to any system on big data platform. The information may be from heterogeneous sources and with different representations. In this paper, we propose an integration scheme for books data from different sources over the web.

Graph database allow simple and rapid retrieval of complex hierarchical structures that are difficult to model in relational systems. Nowadyas graphs have become very popular in various domains like social media analytics, healthcare, chemistry, business intelligence, networking and many more.

We have proposed a system that integrate data of books from various sources across the web. Here graph database is used named TITAN. We can easily retrieve complex information using simple queries using Gremlin, a graph query language.


  1. Romano, N.C., Roussinov, D., Nunamaker, J.F., Chen, H.: Collaborative Information Retrieval Environment: Integration of Information Retrieval with Group Support Systems. In: Proc. of the 32nd Hawaii InternationalConference on System Sciences. . Maui, pp. 5 8. Society Press, Hawaii (1999)
  2. Chidlovskii, B.: Information Extraction from Tree Documents by Learning Subtree Delimiters. In: Proc. UCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp. 38. ACM Digital Library, Acapulco, Mexico (2003)
  3. Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 141 (2008)
  4. Akiba Persistit,
  5. Jiangfan, F., Wei, W.: Multiple Spatial Model Fusion in Heterogeneous Sensor Networks. International Journal of Multimedia and Ubiquitous Engineering 9(2), 114 (2014)
  6. Bass, T: Intrusion detection systems and multisensor data fusion. Communications of the ACM 43(4), 99105 (2000)
  7. Zhao, X., Jiang, H., Jiano, L.: A Data Fusion Based Intrusion Detection Model. In: First International Workshop on Education Technology and Computer Science, pp. 10171021. IEEE Press, Wuhan, China (2009)
  8. Zhuang, X., Xiao, D., Xuejiao, L., Zhang, Y.: Applying Data Fusion in Collaborative Alerts Correlation. In: International Symposium on Computer Science and Computational Technology, pp. 124127. IEEE Press, Shanghai, China (2008)
  9. Chatzigiannakis, V., Androulidakis, G., Pelechrinis, K., Papavassiliou, S., Maglaris, V.: Data fusion algorithms for network anomaly detection:classification and evaluation. In: Third International Conference on Networking and Services (ICNS07), pp. 5051. IEEE Press, Athens, (2007)
  10. Kirk, T, Levy, A.V., Sagiv, Y., Srivastava, D.: The Information Mainfold. In: Proc. of the AAAI 1995 Spring Symp. on Information Gathering from Heterogeneous, Distributed Enviroments, pp. 8591. AAAI Digital Library, Palo Alto, California, USA (1995)
  11. Levy, A.Y, Rajaraman, A., Ordille.J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: Proc. of 22th International Conference on Very Large Databases, pp. 251262. Morgan Kaufmann, Bombay, India (1996)
  12. Halevy, A., Rajaraman, A., Ordille,J.: Data Integration: The Teenage Years. In: Proc. of the 32nd international conference on Very large Databases, pp. 916. ACM Digital Library, Seoul, Korea (2006)
  13. Ciravegma, F.: Integrating Information to Bootstrap Information Extraction from Web Sites. In: Proc. UCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp. 914. ACM Digital Library, Acapulco, Mexico (2003)
  14. Arens, Y, Knoblock, C.A.: Planning and Reforming Queries for Semantically-Modeled Multidatabase Systems. In: Proc. of the Second International Conference on Information and Knowledge Management, pp. 423432. ACM Digital Library, Washington, DC, USA (1993)
  15. Januja, N.K., Hussain, F.K., Hussain, O.K.: Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers 15(2), 126 (2013)
  16. Vicknair, C, Macias, M., Zhao, Z., Nan, X., Chen, Y, Wilkins, D: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference (ACM SE10), pp. 16. ACM Digital Library.
  17. Angels, R., Gutierrez, C: Survey of graph database models. In: ACM Computing Surveys (CSUR), Volume 40 Issue 1, February 2008, Article No. 1.
  18. Bordoloi, S., Kalita, B.: Designing Graph Database Models from Existing Relational Databases. In: International Journal of Computer Applications (IJCA13), Volume 74Number 1.
  19. Park, Y, Shankar, M., Park, B., Ghosh, J.: Graph databases for largescale healthcare systems: A framework for efficient data management and data services. In: IEEE 30th International Conference on Data Engineering Workshops (ICDEW), 2014 ,pp. 1219
  20. Python urllib Library, https://docs.python.Org/2/library/urllib.html
  21. Wget,
  22. URL project ,
  23. Python re Module, https://docs.python.Org/2/library/re.html
  24. Python JSON Module, https://docs.python.Org/2/library/json.html
  25. JSON: JavaScript Object Notation,
  26. TITAN: Graph Database,
  27. Gremlin Documentation,
  28. Apache HBase,
  29. Apache Cassandra,
  30. Oracle Berkely DB,


Heterogeneous sources, graph database, information integration,graph query