Crawling the Hidden Web: An Approach to Dynamic Web Indexing

Moumie Soulemane; Mohammad Rafiuzzaman; Hasan Mahmud

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Optimum to Effective Soil Nail Inclination for Slope Stability using GeoStudio

Md. Naimur Rahman

Random Articles

Vehicle Control in Vehicle to Infrastructure (V2I) Environment

Apr

2017

High Performance Architecture for LILI-II Stream Cipher

December

2014

Broadband Reconfigurable Low Noise Amplifierfor Multiband Application

May

2016

Information System Design for Monitoring Violations of Traffic Signs

Dec

2018

Reseach Article

Crawling the Hidden Web: An Approach to Dynamic Web Indexing

by Moumie Soulemane, Mohammad Rafiuzzaman, Hasan Mahmud

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 55 - Number 1

Year of Publication: 2012

Authors: Moumie Soulemane, Mohammad Rafiuzzaman, Hasan Mahmud

10.5120/8717-7290

Moumie Soulemane, Mohammad Rafiuzzaman, Hasan Mahmud . Crawling the Hidden Web: An Approach to Dynamic Web Indexing. International Journal of Computer Applications. 55, 1 ( October 2012), 7-15. DOI=10.5120/8717-7290

@article{ 10.5120/8717-7290,

author = { Moumie Soulemane, Mohammad Rafiuzzaman, Hasan Mahmud },

title = { Crawling the Hidden Web: An Approach to Dynamic Web Indexing },

journal = { International Journal of Computer Applications },

issue_date = { October 2012 },

volume = { 55 },

number = { 1 },

month = { October },

year = { 2012 },

issn = { 0975-8887 },

pages = { 7-15 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume55/number1/8717-7290/ },

doi = { 10.5120/8717-7290 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:56:08.292525+05:30

%A Moumie Soulemane

%A Mohammad Rafiuzzaman

%A Hasan Mahmud

%T Crawling the Hidden Web: An Approach to Dynamic Web Indexing

%J International Journal of Computer Applications

%@ 0975-8887

%V 55

%N 1

%P 7-15

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The majority of the websites encapsulating online information are dynamic and hence too sophisticated for many traditional search engines to index. With the ever growing quantity of such hidden web pages, this issue continues to raise diverse opinions between the research and practitioner among the web mining communities. Several aspects enriching these dynamic web pages are bringing more challenges day-by-day to index them. By explaining these aspects and challenges, in this paper we have presented a framework for dynamic web indexing. With the implementation of this framework and the results which we have found from it, all the necessary experimental setup and the developmental processes are explained. We have concluded by exposing a possible future scope through the integration of Hadoop-Mapreduce with this framework to update and maintain the index.

References

Dan Sisson. Google SEO secrets, the complete guide, pp. 26–28, 2006.
S. Raghavan, H. Garcia-Molina. Crawling the Hidden Web, in: Proc. of the 27th Int. Conf. on Very Large Databases (VLDB 2001), September 2001.
Dilip Kumar Sharmal, A. k. Sharma2. Analysis of techniques for detection of web search interfaces, 2YMCA University of Science and Technology, Faridabad, Haryana, India,http://www. csi-india. org/web/csi/studentskorner-december10, accessed on June, 2011.
A. Ntoulas, Petros Zerfos, Junghoo Cho, Downloading Textual Hidden Web Content through Keyword Queries, JCDL '05. Proceedings of the 5th ACM/IEEE-CS Joint Conference, 2005.
Luciano Barbosa, Juliano Freire, siphoning hidden-web data through keyword-based interfaces, Journal of Information and Data management, 2010.
http://www. w3schools. com/html/html_forms. asp, accessed on, June 2011.
Wiley, Data Mining the Web Uncovering Patterns. (2007) .
.
Pradeep, Shubha Singh, NewNet- Crawling Deep Web, IJCSNS International Journal of Computer Science and Network Security, VOL. 10 No. 5, pp. 129-130, May 2010.
http://www. worldwidewebsize. com/, accessed on June, 2010.
J Bar-Ilan - Methods for comparing rankings of search engine result-2005, http://www. seojerusalem. com/googles-best-kept-secret/, http://www. search-marketing. info/search-algorithm/index. htm, accessed on June, 2010.
David Hawking, Web Search Engines-1, pp. 87-88, 2006.
Jayant Madhavan, David Ko, Luc jaKot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy. "Google's Deep-Web Crawl", Proceedings of the International Conference on Very Large Databases (VLDB), 2008.
http://www. dmoz. org/, accessed on June, 2010.
Brijendra Singh, Hemant Kumar Singh. "Web Data Mining Research: A Survey", IEEE, 2010.
http://www. ncbi. nlm. nih. gov/pubmed, accessed on June, 2010.
C. H. Chang, M. Kayed, M. R. Girgis, K. F. Shaalan," A survey of web information extraction systems". IEEE Transactions on Knowledge and Data Engineering 18(10), pp. 1411–1428, 2006.
P. Wu, J. R. Wen, H. Liu, W. Y. Ma,"Query selection techniques for efficient crawling of structured web sources". In: Proc. of ICDE, 2006.
Wang Hui-chang, Ruan,Shu-hua, Tang,Qi-jie. "The Implementation of a Web Crawler URL Filter Algorithm Based on Caching". Second International Workshop on Computer Science and Engineering, IEEE, 2009.
Jeffrey Dean, Sanjay Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters". To appear in OSDI, 2004 http://labs. google. com/papers/mapreduce. html.
http://hadoop. apache. org/, accessed on june, 2010.
King-Ip Lin, Hui Chen. "Automatic Information Discovery from the "Invisible Web"", Information Technology: Coding and Computing (ITCC'02), IEEE, 2002.
S. Chakrabarti, Mining the web: Discovering knowledge from Hypertext Data, p. 67. Morgan Kaufmann Publishers, 2003.
Hasan Mahmud, Moumie Soulemane, Muhammad Rafiuzzaman, 'Framework for dynamic indexing from hidden web', IJCSI, Vol. 8, Issue 5, September 2011.

Index Terms

Computer Science

Information Sciences

Keywords

Dynamic web pages crawler hidden web index hadoop