Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing

International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 55 - Number 5
Year of Publication: 2012
M. E. Elaraby
M. M. Sakre
M. Z. Rashad
O. Nomir

M E Elaraby, M M Sakre, M Z Rashad and O Nomir. Article: Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing. International Journal of Computer Applications 55(5):34-42, October 2012. Full text available. BibTeX

	author = {M. E. Elaraby and M. M. Sakre and M. Z. Rashad and O. Nomir},
	title = {Article: Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {55},
	number = {5},
	pages = {34-42},
	month = {October},
	note = {Full text available}


Search engines require computers with high computation resources for processing to crawl web pages and huge data storage to store billions of pages collected from the World Wide Web after parsing and indexing these pages. The indexer is one of the main components of the search engine that come intermediate between the crawler and the searcher. Indexing is the process of organizing the collected data to facility information retrieval and minimizes the time of query. Indexing requires huge processing and storage resources, and the indexing has a high effect on the performance of the search engine, this effect differs based on the structure and the process index construction. Distribution of the indexing process over a cluster of computers in grid computing will improve the performance through distributing the parsing load over a number of computers in a grid environment, and distributing the indexed data over distributed memory according to terms over a number of computers remotely. Due to the search engine data collections with frequent changes, the indexer require dynamic indexing. So the merge of the distributed and dynamic indexing in architecture over grid computing will give a better performance utilizing the available resources without need to computers with high cost such as supercomputers.


