CFP last date
20 May 2024
Reseach Article

AuTopicGen: Rule based Positional Pattern Approach for Topic Collection in IR

by Payal Joshi, S. V. Patel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 109 - Number 14
Year of Publication: 2015
Authors: Payal Joshi, S. V. Patel
10.5120/19260-1017

Payal Joshi, S. V. Patel . AuTopicGen: Rule based Positional Pattern Approach for Topic Collection in IR. International Journal of Computer Applications. 109, 14 ( January 2015), 44-47. DOI=10.5120/19260-1017

@article{ 10.5120/19260-1017,
author = { Payal Joshi, S. V. Patel },
title = { AuTopicGen: Rule based Positional Pattern Approach for Topic Collection in IR },
journal = { International Journal of Computer Applications },
issue_date = { January 2015 },
volume = { 109 },
number = { 14 },
month = { January },
year = { 2015 },
issn = { 0975-8887 },
pages = { 44-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume109/number14/19260-1017/ },
doi = { 10.5120/19260-1017 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:44:49.808916+05:30
%A Payal Joshi
%A S. V. Patel
%T AuTopicGen: Rule based Positional Pattern Approach for Topic Collection in IR
%J International Journal of Computer Applications
%@ 0975-8887
%V 109
%N 14
%P 44-47
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

IR systems consist of phases like document preprocessing, indexing, query expansion, query matching, ranking etc. The document preprocessing phase is the most important phase to parse the document and collect keywords. Relevance of overall IR system improves if main topics of document are perfectly identified during this phase. It is a known fact that Topics are mostly phrase based. Existing phrase search methods like n-grams or positional indexes are quite complex and also suffer from problems of inaccuracy, requirement of large storage space etc. Moreover, IR system like digital library may consist of eBooks on one or more subjects. So for phrase collection, one may have to use appropriate ontology to retrieve phrases or topics. This paper presents a new approach called AuTopicGen (Automatic Topic Generator) that automatically collects most relevant topics of eBooks from its contents and indexes using rule based positional patterns approach. From the collected topics, we create topic hierarchy that can work as light weight ontology to improve overall performance of information retrieval system especially for phrase based queries and to assist user with query recommendation. Further this will be useful as topic maps, mind maps, to improve user interface to help user navigate through topics, for categorization, query expansion and ranking algorithms. We have successfully implemented the approach for topics collection practically on eBooks and presented in this paper.

References
  1. Christopher D. Manning. Prabhakar Raghavan An. Introduction to. Information. Retrieval, Online edition (c) 2009 Cambridge UP. 1 (Aug. 2006). DOI= http://nlp. stanford. edu/IR-book/pdf/irbookprint. pdf.
  2. Dongdong Shan, Wayne Xin Zhao, Jing He, Rui Yan, Hongfei Yan, Xiaoming Li, 2011, Efficient phrase querying with flat position index. CIKM 2011. In Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2001-2004. ISBN: 978-1-4503-0717-8
  3. Rossitza M. Setchi, Qiao Tang. 2007. Concept Indexing using Ontology and Supervised Machine Learning. In International Journal of Computer, Information, Systems and Control Engineering Vol:1 No:1.
  4. Xing Wei & W. Bruce Croft. 2007. Investigating Retrieval Performance with Manually-Built Topic Models. RIAO'07.
  5. Beel, J. , Gipp, B. , Stiller, J. -O. 2009. Information retrieval on mind maps - what could it be good for?, 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009. 11-14 (Nov. 2009), 1-4. ISBN: 978-963-9799-76-9, DOI= http://dx. doi. org/10. 4108/ICST. COLLABORATECOM2009. 8298
  6. Joeran Beel, Stefan Langer1, Marcel Genzmehr, Bela Gipp. 2014. Utilizing Mind-Maps for Information Retrieval and User Modeling. 14 (Apr. 2014), UMAP 2014.
  7. Paolo Rossol, Edgardo Ferretti, Daniel Jiménez, and Vicente Vidal. Petr Sojka, Karel Pala, Pavel Smrž, Christiane Fellbaum, Piek Vossen (Eds. ). 2004. Text Categorization and Information Retrieval Using WordNet Senses. In Proceedings of GWC-2004, 299-304.
  8. Kiril Panev and Klaus Berberich. B. Benatallah et al. (Eds. ). 2014. Phrase Queries with Inverted + Direct Indexes. WISE 2014 . 156-169. Springer International Publishing Switzerland 2014.
  9. Jinru He, Torsten Suel. 2012. Optimizing Positional Index Structures for Versioned Document Collections. In SIGIR'12, 12-16 (Aug. 2012), Portland, Oregon, USA.
  10. Manish Patil, Sharma V Thankachan, Rahul Shah, Wing-Kai Hon, Jeffrey Scott Vitter, and Sabrina Chandrasekaran. Inverted indexes for phrases and strings. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 555–564. ACM, 2011.
  11. Shuang Liu, Fang Liu, Clement Yu, Weiyi Meng. An Effective Approach to Document Retrieval via Utilizing WordNet and Recognizing Phrases. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in Information Retrieval, pages 266-272. ACM, 2004.
  12. Shashank Gugnani, Rajendra Kumar Roul. Triple Indexing: An Efficient Technique for Fast Phrase Query Expansion. In International Journal of Computer Applications (0975 8887) Volume 87 - No 13, February 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Topic Collection IR System