Developing an Expert IR System from Multidimensional Dataset

Anagha Chaudhari; Amitabh Mudiraj; Swati Shinde

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Developing an Expert IR System from Multidimensional Dataset

by Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 122 - Number 1

Year of Publication: 2015

Authors: Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde

10.5120/21662-4718

Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde . Developing an Expert IR System from Multidimensional Dataset. International Journal of Computer Applications. 122, 1 ( July 2015), 6-9. DOI=10.5120/21662-4718

@article{ 10.5120/21662-4718,

author = { Anagha Chaudhari, Amitabh Mudiraj, Swati Shinde },

title = { Developing an Expert IR System from Multidimensional Dataset },

journal = { International Journal of Computer Applications },

issue_date = { July 2015 },

volume = { 122 },

number = { 1 },

month = { July },

year = { 2015 },

issn = { 0975-8887 },

pages = { 6-9 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume122/number1/21662-4718/ },

doi = { 10.5120/21662-4718 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:11:24.060225+05:30

%A Anagha Chaudhari

%A Amitabh Mudiraj

%A Swati Shinde

%T Developing an Expert IR System from Multidimensional Dataset

%J International Journal of Computer Applications

%@ 0975-8887

%V 122

%N 1

%P 6-9

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Now-a-days due to increase in the availability of computing facilities, large amount of data in electronic form is been generated. The data generated is to be analyzed in order to maximize the benefit of intelligent decision making. Text categorization is an important and extensively studied problem in machine learning. The basic phases in the text categorization include preprocessing features like removing stop words from documents and applying TF-IDF is used which results into increase efficiency and deletion of irrelevant data from huge dataset. Application of TF-IDF algorithm on dataset gives weight for each word which summarized by Weight matrix. Preprocessing reduces the size of dataset which ultimately improves the performance of search engine. After that, index is generated from dataset. Index contains term with its occurrence in file and also its location in file. This paper discusses the implication of efficient Information Retrieval system for text-based data using clustering approaches.

References

V. Srividhya, R. Anitha , " Evaluating Preprocessing Techniques in Text Categorization ",ISSN 0974-0767,International Journal of Computer Science and Application Issue 2010
Xue, X. and Zhou, Z. (2009) " Distributional Features for TextCategorization ", IEEE Transactions on Knowledge and Data Engineering,Vol. 21, No. 3, Pp. 428-442.
Porter, M. (1980) "An algorithm for suffix stripping, Program ", Vol. 14, No. 3, Pp. 130–137.
Salton, G. , "Automatic information organization and retrieval", McGraw-Hill, New York. 1968
Spärck Jones, K. , "A statistical interpretation of term specificity and its application in retrieval", Journal of Documentation, vol. pp. 28, 11–21, 1972.
Tian Xia, Yanmei Chai "An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm", JOURNAL OF SOFTWARE, VOL. 6, NO. 3, MARCH 2011
René Arnulfo García-Hernández , J. Fco. Martínez-Trinidad and J. Ariel Carrasco-Ochoa, Finding Maximal "Sequential patterns in Text Document Collections and Single Documents" Informatica 34 (2010) 93–101 93
Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth".
R. Srikant and R. Agrawal. "Mining sequential patterns: Generalizations and performance improvements". In Proc. 5th Int. Conf. Extending Database Technology (EDBT'96), pages 3–17, Avignon, France, Mar. 1996.
Carlos Cobos, Henry Muñoz-Collazos, Richar Urbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma "Clustering of web search results based on the cuckoo search algorithm and balanced bayesian information criterion" ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May9 2014.
X. -S. Yang, "Nature-Inspired Metaheuristic Algorithms" (2008) 128.
Rui Tang, Simon Fong, Xin-She Yang, Suash Deb," Integrating nature-inspired optimization algorithms to k-means clustering", 978-1-4673-2430-4/12/$31. 00 ©2012 IEEE.
Carlos Cobos, Henry Muñoz-Collazos, Richar Urbano-Muñoz, Martha Mendoza, Elizabeth Leónc, Enrique Herrera-Viedma "Clustering Of Web Search Results Based On The Cuckoo Search Algorithm And Balanced Bayesian Information Criterion " ELSEVIER Publication, 2014 Elsevier Inc. All rights reserved ,21 May 2014.
Manoj Chahal,Jaswinder Singh "Effective Information Retrieval Using Similarity Function: Horngand Yeh Coefficient",Volume 3, Issue 8, August 2013.
Agnihotri, D. ; Verma, K. ; Tripathi, P. , "Pattern and Cluster Mining on Text Data," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, vol. , no. , pp. 428,432, 7-9 April 2014
Patil, L. H. ; Atique, M. , "A novel approach for feature selection method TF-IDF in document clustering," Advance Computing Conference (IACC), 2013
http://www. ardendertat. com/2011/05/30/how-to- implement-a-search-enginepart-1-create-index/
Anagha Chaudhari, Amitabh Mudiraj, Yogesh Jagdale, Pravin Phjadtare, Raviraj Mohite, Rohan Petare, Pranil Kudale, "Preprocessing of High Dimensional Dataset for Developing Expert IR System", ICCUBEA-2015, March 2015.

Index Terms

Computer Science

Information Sciences

Keywords

Information retrieval stop words TF IDF text based clustering fitness functions