CFP last date
20 May 2024
Reseach Article

Hybrid Approach for Annotating Unstructured Document

by Meghana.h.j, Pushpa Ravikumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 120 - Number 13
Year of Publication: 2015
Authors: Meghana.h.j, Pushpa Ravikumar
10.5120/21291-4270

Meghana.h.j, Pushpa Ravikumar . Hybrid Approach for Annotating Unstructured Document. International Journal of Computer Applications. 120, 13 ( June 2015), 38-41. DOI=10.5120/21291-4270

@article{ 10.5120/21291-4270,
author = { Meghana.h.j, Pushpa Ravikumar },
title = { Hybrid Approach for Annotating Unstructured Document },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 120 },
number = { 13 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 38-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume120/number13/21291-4270/ },
doi = { 10.5120/21291-4270 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:06:10.043657+05:30
%A Meghana.h.j
%A Pushpa Ravikumar
%T Hybrid Approach for Annotating Unstructured Document
%J International Journal of Computer Applications
%@ 0975-8887
%V 120
%N 13
%P 38-41
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Annotation is a process of adding the information into the Document which is useful for extracting the information. A large number of organizations now days generate a large amount of data which is always present in the textual format. But such collections of textual document which contains a large amount of structured information which is completely hidden in the unstructured information. Information extraction algorithm is too costly because it always works on the top of the text and it does not provide the necessary structured information. In our paper, we present a method to generate the structured attribute by identifying the documents which contain the information of interest and this information in future useful for querying the database. The major contribution of this paper, we propose the algorithm, where it identifies the structured attribute which is present in the document by combining both the query workload and the content of the text document. Our Experiment result shows that our technique gives the better results compared to the methods which only relay on the content of the document and only on the query workload.

References
  1. S. R. Jeffery, M. J. Franklin, and A. Y. Halevy, "Pay-as-you-go user feedback for dataspace systems," in ACM SIGMOD, 2008.
  2. A. Jain and P. G. Ipeirotis, "A quality-aware optimizer for information extraction," ACM Transactions on Database Systems, 2009.
  3. M. Jayapandian and H. Jagadish, "Expressive query specification through form customization," in Proceedings of the 11th international conference on Extending database technology: Advances in database technology, ser. EDBT '08. New York, NY, USA: ACM, 2008, pp. 416–427
  4. J. M. Ponte and W. B. Croft, "A language modeling approach to information retrieval," in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '98. New York, NY, USA: ACM, 1998,
  5. R. Fagin, A. Lotem, and M. Naor, "Optimal aggregation algorithms for middleware," J. Comput. Syst. Sci. , vol. 66, pp. 614–656, June 2003.
  6. G. Tsoumakas and I. Vlahavas, "Random k-labelsets: An ensemble method for multilabel classification," in Proceedings of the 18th European conference on Machine Learning, ser. ECML '07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 406–417
Index Terms

Computer Science
Information Sciences

Keywords

Annotation CADS form CV and QV