CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Design of Hidden Web Search Engine

by Anuradha, A.K.Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 30 - Number 9
Year of Publication: 2011
Authors: Anuradha, A.K.Sharma
10.5120/3664-4910

Anuradha, A.K.Sharma . Design of Hidden Web Search Engine. International Journal of Computer Applications. 30, 9 ( September 2011), 22-31. DOI=10.5120/3664-4910

@article{ 10.5120/3664-4910,
author = { Anuradha, A.K.Sharma },
title = { Design of Hidden Web Search Engine },
journal = { International Journal of Computer Applications },
issue_date = { September 2011 },
volume = { 30 },
number = { 9 },
month = { September },
year = { 2011 },
issn = { 0975-8887 },
pages = { 22-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume30/number9/3664-4910/ },
doi = { 10.5120/3664-4910 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:16:38.941460+05:30
%A Anuradha
%A A.K.Sharma
%T Design of Hidden Web Search Engine
%J International Journal of Computer Applications
%@ 0975-8887
%V 30
%N 9
%P 22-31
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Extracting Hidden Web data arises several challenges especially in case of data sources with multi-attribute interfaces wherein multiple attributes and their respective values should be extracted, indexed and stored. Hence, accessing the hidden web content is a potential research area as the pages are dynamically created through search query interface. However, direct query through search interface is laborious way to search. Hence, there has been increased interest in retrieval and integration of hidden web data with a view to give high quality information to the web user or we can say it is highly desirable to build a hidden web search engine. This paper proposes a novel approach for building a hidden web search engine which fills the search forms automatically, extracts the result records, store them in a repository for later searching. This proposed work also provides the user with user interface in which user can fill the query and finds the desired results.

References
  1. Anuradha, A.K.Sharma, “A Novel Approach for Automatic Detection and Unification of Web Search Query Interfaces using Domain Ontology” selected in International Journal of Information Technology and knowledge management(IJITKM), August 2009.
  2. BrightPlanet Corp. “The deep web: surfacing hidden value.”
  3. D. Florescu, A.Y. Levy, and A.O. Mendelzon. “Database techniques for the world-wide web: a survey,” SIGMOD Record 27(3), 59-74, 1998.
  4. J. Wang and F. Lochovsky. “Wrapper Induction based on Nested Pattern Discovery,” Technical Report HKUST- CS-27-02, Dept. of Computer Science, Hong Kong U. of Science & Technology, 2002.
  5. C.H. Chang, and S.C. Lui. “IEPAD: information extraction based on pattern discovery,” Proc. 10th World Wide Web Conf. 681-688, 2001.
  6. S. Raghavan and H. Garcia-Molina. Crawling the Hidden Web. In Proceedings of VLDB, pages 129–138, 2001. SIGMOD, pages 217–228, 2003.
  7. S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98–100, 1998.
  8. L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
  9. Ntoulas, A., Zerfos, P., Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries (JCDL05). 2005.
  10. Brin, Sergey and Page Lawrence. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, April 1998
  11. YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW ’02: Proceedings of the 11th international conference on World Wide Web, pages 242–250, New York, NY, USA, 2002. ACM Press
  12. H. He, W. Meng, C. Yu, and Z. Wu. WISE- integrator: An automatic integrator of Web search interfaces for e-commerce. In VLDB, 2003.
  13. Masayu Leylia Khodra1*, Dwi Hendratmo Widyantoro, “An Efficient and Effective Algorithm for Hierarchical classificationof Search Results “, Proceedings of the International Conference on Electrical Engineering and Informatics Institute Teknologi Bandung, Indonesia June 17-19, 2007
  14. S. Raghavan and H. Garcia-Molina. Crawling the Hidden Web. In Proceedings of VLDB, pages 129–138, 2001.
  15. B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proceedings of SIGMOD, pages 217–228, 2003.
  16. W3C Architecture domain, available at htttp://www.w3.org/DOM/
  17. B. Liu, R. Grossman, and Y. Zhai, “Mining Data Records in Web Pages,” Proc. Int’l Conf. Knowledge Discovery in Databases and Data Mining (KDD), 2003, pp. 601-606.
  18. A. H. F. Laender, B. A. Ribeiro-Neto, A. S. da Silva, and J. S. Teixeira, “A Brief Survey of Web Data Extraction Tools,” SIGMOD Record, 2002, vol. 31, no. 2, pp. 84-93.
  19. B. Liu, and Y. Zhai, “NET—A System for Extracting Web Data from Flat and Nested Data Records,” Proc. Sixth Int’l Conf. Web Information Systems Eng., 2005, pp. 487-495.
  20. C. H. Chang, and M. R. Gigis, "A Survey of Web Information Extraction Systems", IEEE Transaction on Knowledge and Data Engineering, 2006, Vol.18, No. 10, pp. 1411-1428.
  21. N. Ashish and C. Knoblock. Wrapper generation for semi-structured internet sources. SIGMOD Record, 26(4):8{15, December 1997
  22. Gruber, T. R. (1993). Towards principles for the design of ontologies used for knowledge sharing. In Guarino, N. and Poli, R., editors, Formal Ontology in Conceptual Analysis and Knowledge Representation, Deventer, The Netherlands. Kluwer AcademicPublishers.
  23. Cope, J., Craswell, N., and Hawking, D. (2003). Automated Discovery of Search Interfaces on the web. In Proceedings of the Fourteenth Australasian Database Conference (ADC2003), Adelaide, Australi,a.
  24. Simon, K., Lausen, G., and Boley, H. 2006. From HTML documents to web tables and rules. In ICEC, M. S. Fox and B. Spencer, Eds. ACM International Conference Proceeding Series, vol. 156. ACM, 125–131.
  25. Chang, K. C.-C., He, B., Li, C., Patel, M., and Zhang, Z. 2004. Structured databases on the web: observations and implications. SIGMOD Rec. 33, 3, 61–70.
  26. 27 L. Gravano, H. Garcia-Molina, A. Tomasic, "GlOSS: Text-Source Discovery over Internet", TODS 24(2), 1999.
Index Terms

Computer Science
Information Sciences

Keywords

WWW Hidden Web Information Extraction Search interfaces DOM