Call for Paper - July 2019 Edition
IJCA solicits original research papers for the July 2019 Edition. Last date of manuscript submission is June 20, 2019. Read More

Learning based Clustering for the Automatic Annotations from Web Databases

International Journal of Computer Applications
© 2015 by IJCA Journal
Volume 113 - Number 7
Year of Publication: 2015
Richa Saxena
Sushil Kumar Chaturvedi

Richa Saxena and Sushil Kumar Chaturvedi. Article: Learning based Clustering for the Automatic Annotations from Web Databases. International Journal of Computer Applications 113(7):18-23, March 2015. Full text available. BibTeX

	author = {Richa Saxena and Sushil Kumar Chaturvedi},
	title = {Article: Learning based Clustering for the Automatic Annotations from Web Databases},
	journal = {International Journal of Computer Applications},
	year = {2015},
	volume = {113},
	number = {7},
	pages = {18-23},
	month = {March},
	note = {Full text available}


Rapid increase of use of internet provides knowledge extraction from the web databases and HTML pages associated with it. Although there are various techniques implemented for the access of the annotations of the search results from the web databases. Here in this paper by identifying the problems with the existing techniques for the annotation search results from web databases such as alignment problem or to split composite text node when there are no explicit separators. Here propose an efficient technique which overcomes the above problems by using some supervised learning algorithm such as support vector machine. The technique implemented provides high rate of information by providing high annotations search results from web databases. The proposed method implemented here for the efficient retrieval of text nodes and data units using supervised learning approach using SVM provides efficient precision and recall as compared to the existing approach. The proposed methodology implemented here using SVM based clustering and labeling of search records is compared with existing methodology implemented for the search records. The Result Analysis shows the performance of the proposed methodology. The proposed method shows higher precision and recall as well as has high Accuracy for the prediction of annotated search records from the web databases.


  • Y. Lu, H. He, H. Zhao, W. Meng, and C. Yu, "Annotating Structured Data of the Deep Web," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
  • Priyanka P. Boraste "A Survey on Data Annotation for the Web Databases "IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. XI (Mar-Apr. 2014), PP 68-70 www. iosrjournals. org.
  • Y. Pauline Jeba, Mrs. P. Rebecca Sandra, "A Survey On Annotating Search Results From Web Databases", International Journal Of Research In Computer Applications And Robotics, Vol -1, Issue-9, 2013.
  • J. Kahan, M-R. Koivunen, Annotea: an open RDF infrastructure for shared Web annotations. Proceedings of the 10th international conference on World Wide Web, 2001.
  • L. Gravano, H. Garcia-Molina, A. Tomasic, "GlOSS: Text-Source Discovery over Internet", TODS 24(2), 1999.
  • K. Khelif, R. Dieng-Kuntz, P. Barbry, An Ontology-based Approach to Support Text Mining and Information Retrieval in the Bio logical Domain, in J. UCS 13(12), pp. 1881-1907, 2007.
  • A. Setzer, R. Gaizauskas, TimeM L: Robust specification of event and temporal expressions in text. In The second international conference on language resources and evaluation, 2000.
  • C. Roussey, S. Calabretto, An experiment using Conceptual Graph Structure for a Multilingual Information System, in the 13th International Conference on Conceptual Structures, ICCS'2005.
  • A Survey of Current Approaches for Mapping of Relational Databases to RDF. Retrieved October 28, 2011 from www. w3. org/2005/Incubator/ rdb2rdf/RDB2RDF_SurveyReport. pdf, 2005.
  • J. Madhayan et al, "Google's Deep-Web Crawl. " Proceedings of the VLDB Endowment, Vol. 1, Issue 2, pp. 1241-1252, 2008.
  • A Survey of Web Information Extraction Systems Chia-Hui Chang, Member, IEEE Computer Society, Mohammed Kayed, Moheb Ramzy Girgis, Member, Ieee Transactions On Knowledge And Data Engineering, VOL. 18, NO. 10, OCTOBER 2006
  • V. Crescenzi, G. Mecca, and P. Merialdo, "RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites," Proc. Very Large Data Bases (VLDB) Conf. , 2001.