CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation

by Shashank Gugnani, Rajendra Kumar Roul
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 87 - Number 13
Year of Publication: 2014
Authors: Shashank Gugnani, Rajendra Kumar Roul
10.5120/15266-3970

Shashank Gugnani, Rajendra Kumar Roul . Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation. International Journal of Computer Applications. 87, 13 ( February 2014), 9-13. DOI=10.5120/15266-3970

@article{ 10.5120/15266-3970,
author = { Shashank Gugnani, Rajendra Kumar Roul },
title = { Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation },
journal = { International Journal of Computer Applications },
issue_date = { February 2014 },
volume = { 87 },
number = { 13 },
month = { February },
year = { 2014 },
issn = { 0975-8887 },
pages = { 9-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume87/number13/15266-3970/ },
doi = { 10.5120/15266-3970 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:05:48.038595+05:30
%A Shashank Gugnani
%A Rajendra Kumar Roul
%T Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation
%J International Journal of Computer Applications
%@ 0975-8887
%V 87
%N 13
%P 9-13
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Phrase query evaluation is an important task of every search engine. Optimizing the query evaluation time for phrase queries is the biggest threat for the current search engine. Usually, phrase queries are a hassle for standard indexing techniques. This is generally because, merging the posting lists and checking the word ordering takes a lot of time. This paper proposes a new technique called Triple Indexing to index web documents which optimizes query evaluation time for phrase queries by reducing the time for merging the posting lists and checking the word ordering. In addition, a proper procedure has been put forward for document ranking using an extended vector space model. The 4 Universities dataset and Industry Sector dataset of Carnegie Mellon University has been used for experimental purpose and it has been found that using the proposed method with a modern machine, the query time for phrase queries is reduced by almost 50 percent, compared to a standard inverted index.

References
  1. The 4 universities data set. Available Online at: http://www. cs. cmu. edu/afs/cs. cmu. edu/project/ theo-20/www/data/.
  2. The industry sector data set. Available Online at: http:// www. cs. cmu. edu/TextLearning/datasets. html.
  3. Gennady Antoshenkov and Mohamed Ziauddin. Query processing and optimization in oracle rdb. The VLDB JournalThe International Journal on Very Large Data Bases, 5(4):229–237, 1996.
  4. Dirk Bahle, Hugh E Williams, and Justin Zobel. Optimised phrase querying and browsing of large text databases. In Australian Computer Science Communications, volume 23, pages 11–19. IEEE Computer Society, 2001.
  5. Renaud Delbru, Stephane Campinas, and Giovanni Tummarello. Searching web data: An entity retrieval and high-performance indexing model. Web Semantics: Science, Services and Agents on the World Wide Web, 10(0):33 – 58, 2012. Web-Scale Semantic Information Processing.
  6. Tingjian Ge. Join queries on uncertain data: Semantics and efficient processing. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 697–708. IEEE, 2011.
  7. Khaled M Hammouda and Mohamed S Kamel. Efficient phrase-based document indexing for web document clustering. Knowledge and Data Engineering, IEEE Transactions on, 16(10):1279–1296, 2004.
  8. Wen-Chiao Hsu and I-En Liao. Cis-x: A compacted indexing scheme for efficient query evaluation of fXMLg documents. Information Sciences, 241(0):195 – 211, 2013.
  9. Dik L Lee, Huei Chuang, and Kent Seamons. Document ranking and the vector-space model. Software, IEEE, 14(2):67–75, 1997.
  10. Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, and Ramesh Agarwal. Efficient update of indexes for dynamically changing web documents. World Wide Web, 10(1):37–69, 2007.
  11. Ajit Kumar Mahapatra and Sitanath Biswas. Inverted indexes: Types and techniques. International Journal of Computer Science, 8.
  12. Manish Patil, Sharma V Thankachan, Rahul Shah, Wing-Kai Hon, Jeffrey Scott Vitter, and Sabrina Chandrasekaran. Inverted indexes for phrases and strings. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 555–564. ACM, 2011.
  13. Guoren Wang, Ge Yu, Kunihiko Kaneko, and Akifumi Makinouchi. Comparison of parallel algorithms for path expression query in object database systems. In Database Systems for Advanced Applications, 2001. Proceedings. Seventh International Conference on, pages 250–257. IEEE, 2001.
  14. Ruilong Yang, Qingsheng Zhu, and Yunni Xia. A novel weighted phrase-based similarity for web documents clustering. Journal of Software, 6(8):1521–1528, 2011.
  15. Yuye Zhang and Alistair Moffat. Some observations on user search behavior. Australian Journal of Intelligent Information Processing Systems, 9(2):1–8, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Triple Index Inverted Index Query Optimization Phrase Queries Vector Space Model