CFP last date
20 September 2024
Reseach Article

Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing

by M. E. Elaraby, M. M. Sakre, M. Z. Rashad, O. Nomir
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 55 - Number 5
Year of Publication: 2012
Authors: M. E. Elaraby, M. M. Sakre, M. Z. Rashad, O. Nomir
10.5120/8754-2657

M. E. Elaraby, M. M. Sakre, M. Z. Rashad, O. Nomir . Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing. International Journal of Computer Applications. 55, 5 ( October 2012), 34-42. DOI=10.5120/8754-2657

@article{ 10.5120/8754-2657,
author = { M. E. Elaraby, M. M. Sakre, M. Z. Rashad, O. Nomir },
title = { Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing },
journal = { International Journal of Computer Applications },
issue_date = { October 2012 },
volume = { 55 },
number = { 5 },
month = { October },
year = { 2012 },
issn = { 0975-8887 },
pages = { 34-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume55/number5/8754-2657/ },
doi = { 10.5120/8754-2657 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:56:31.375824+05:30
%A M. E. Elaraby
%A M. M. Sakre
%A M. Z. Rashad
%A O. Nomir
%T Dynamic and Distributed Indexing Architecture in Search Engine using Grid Computing
%J International Journal of Computer Applications
%@ 0975-8887
%V 55
%N 5
%P 34-42
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Search engines require computers with high computation resources for processing to crawl web pages and huge data storage to store billions of pages collected from the World Wide Web after parsing and indexing these pages. The indexer is one of the main components of the search engine that come intermediate between the crawler and the searcher. Indexing is the process of organizing the collected data to facility information retrieval and minimizes the time of query. Indexing requires huge processing and storage resources, and the indexing has a high effect on the performance of the search engine, this effect differs based on the structure and the process index construction. Distribution of the indexing process over a cluster of computers in grid computing will improve the performance through distributing the parsing load over a number of computers in a grid environment, and distributing the indexed data over distributed memory according to terms over a number of computers remotely. Due to the search engine data collections with frequent changes, the indexer require dynamic indexing. So the merge of the distributed and dynamic indexing in architecture over grid computing will give a better performance utilizing the available resources without need to computers with high cost such as supercomputers.

References
  1. S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine", Computer Networks and ISDN Systems, 30(1–7):107–117, April 1998.
  2. Clarke, C. , Cormack, and G. , "Dynamic Inverted Indexes for a Distributed Full-Text Retrieval System", TechRep MT-95-01, University of Waterloo, February 1995.
  3. G. Huck, F. Moser, and Erich J. Neuhold, "Integration and handling of hypermedia information as a challenge for multimedia and federated database systems", In Proc. Of the Second Intnl. Workshop on Advances in Databases and Information Systems - ADBIS'95, pages 183–194, Moscow, June 27–30 1995.
  4. C. Faloutsos and S. Christodoulakis, "Signature files: an access method for documents and its analytical performance evaluation". ACM Trans. on Database Systems, 4(2):267–288, 1984.
  5. A. Kent, R. Sacks-Davies, and K. Ramamohanarao, "A superimposed coding scheme based on multiple block descriptor files for indexing very large databases". In Proc. 14 conf. VLDB, pages 351–359, 1988.
  6. Ahmar Abbas, Book: "GRID COMPUTING: A Practical Guide to Technology and Application", ISBN: 1-58450-276-2, Charles River Media Inc, 2004.
  7. I. Foster, C. Kesselman, and S. Tuecke, "The anatomy of the grid: Enabling scalable virtual organizations", International Journal of High Performance Computing Applications Fall 2001 15: 200-222.
  8. I. Foster, C. Kesselman, J. M. Nick, S. Tuecke, "The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration", Global Grid Forum, June 22, 2002.
  9. F. Berman, G. Fox, and T. Hey, Book: "Grid Computing: Making the Global Infrastructure a Reality", published March 2003.
  10. Foster, I. and Kesselman, C. (eds. ), The Grid2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.
  11. C. D. Manning, P. Raghavan, and H. Schutze, Book: "Introduction to Information Retrieval", Cambridge University Press 2008.
  12. M. Martynov and B. Novikov, "An Indexing Algorithm for Text Retrieval", Proceedings of the International Workshop on Advances in Databases and Information Systems (ADBIS'96). Moscow, September 10–13, 1996.
  13. J. Zobel, A. Moffat, and R. Sacks-Davis, "An Efficient Indexing Technique for Full-Text Database Systems", Proceedings of the 18th VLDB Conference Vancouver, British Columbia, Canada 1992.
  14. E. Adar, J. Teevan, and S. T. Dumais, "Resonance on the Web: Web Dynamics and Revisitation Patterns", ACM 978-1-60558-246-7/08/04, Boston, MA, USA, April 4–9, 2009.
  15. A. Gulli and A. Signorini, "The Indexable Web is More than 11. 5 billion pages", ACM 1595930515/05/0005, Chiba, Japan , May 10–14, 2005.
  16. M. Klein and M. L. Nelson. "Investigating the Change of Web Pages' Titles Over Time", InDP'09, Austin, TX, USA, June 19, 2009.
  17. E. Adar, J. Teevan, S. T. Dumais, and J. L. Elsas, "The Web Changes Everything: Understanding the Dynamics of Web Content", ACM 978-1-60558-390-7, WSDM'09, Barcelona, Spain, February 9-12, 2009.
  18. D. Logothetis and K. Yocum, "Data Indexing for Stateful, Large-scale Data Processing", ACM, NetDB '09 Big Sky, MT USA, 2009.
  19. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks", ACM 978-1-59593-636-3/07/0003, Lisboa, Portugal, March 21–23, 2007.
  20. A. Luther et al, "Peer-to-peer grid computing and a . NET-based Alchemi framework", High performance computing: paradigm and infrastructure, Laurence Yang and Minyi Guo (eds), Chap 21, 403-429, Wiley Press, New Jersey, USA, June 2005.
  21. A. Luther, R. Buyya, R. Ranjan and S. Venugopal, "Alchemi: A . NET-based Grid Computing Framework and its Integration into Global Grids", Technical Report, GRIDS-TR-2003-8, Grid Computing and Distributed Systems Laboratory, University of Melbourne, Australia.
  22. K. Nadiminti, Yi-Feng Chiu, N. Teoh, A. Luther, S. Venugopal, and R. Buyya, ExcelGrid: A . NET Plug-in for Outsourcing Excel Spreadsheet Workload to Enterprise and Global Grids, Proceedings of the 12th International Conference on Advanced Computing and Communication, ADCOM 2004, December 15-18, 2004.
Index Terms

Computer Science
Information Sciences

Keywords

Indexer World Wide Web Search engine Grid Computing Web pages Secondary index Main index Alchemi Manager Executor