Preprocessing Techniques in Text Categorization

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

Preprocessing Techniques in Text Categorization

Published on December 2013 by Pritam C. Gaigole, L. H. Patil, P. M Chaudhari

National Conference on Innovative Paradigms in Engineering & Technology 2013

Foundation of Computer Science USA

NCIPET2013 - Number 3

December 2013

Authors: Pritam C. Gaigole, L. H. Patil, P. M Chaudhari

Pritam C. Gaigole, L. H. Patil, P. M Chaudhari . Preprocessing Techniques in Text Categorization. National Conference on Innovative Paradigms in Engineering & Technology 2013. NCIPET2013, 3 (December 2013), 1-3.

@article{

author = { Pritam C. Gaigole, L. H. Patil, P. M Chaudhari },

title = { Preprocessing Techniques in Text Categorization },

journal = { National Conference on Innovative Paradigms in Engineering & Technology 2013 },

issue_date = { December 2013 },

volume = { NCIPET2013 },

number = { 3 },

month = { December },

year = { 2013 },

issn = 0975-8887,

pages = { 1-3 },

numpages = 3,

url = { /proceedings/ncipet2013/number3/14708-1334/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Innovative Paradigms in Engineering & Technology 2013

%A Pritam C. Gaigole

%A L. H. Patil

%A P. M Chaudhari

%T Preprocessing Techniques in Text Categorization

%J National Conference on Innovative Paradigms in Engineering & Technology 2013

%@ 0975-8887

%V NCIPET2013

%N 3

%P 1-3

%D 2013

%I International Journal of Computer Applications

Abstract

Bulk data is generated in the era ofInformation Technology. If it is not stored in aproperly systematic manner then the generated datacannot be reused. This is because navigation becomes if not impossible, certainly very difficult. The data generated is to analyze so as to maximizethe benefits, for intelligent decision making. Textcategorization is an important and extensively studiedproblem in machine learning. The basic phases in textcategorization include preprocessing features, extractingrelevant features against the features in a database, andfinally categorizing a set of documents into predefinedcategories. Most of the researches in text categorization arefocusing more on the development of algorithms andcomputer techniques.

References

K. Aas "Text categorization: A survey", Technicalreport,Norwegian Computing Center, June, 1999.
Katharina, M. and Martin, S. (2004) "The Mining Mart Approach to Knowledge Discovery in Databases", NingZhong and Jiming Liu (editors), Intelligent Technologies for Information Analysis Springer, Pp. 47-65.
Xue, X. and Zhou, Z. (2009),"Distributional Features for Text Categorization", IEEE Transactions on Knowledge and Data Engineering,Vol. 21, No. 3, Pp. 428-442.
Salton, G. (1989), "Automatic Text Processing: TheTransformation, Analysis, and Retrieval of Information ByComputer", Pennsylvania, Addison-Wesley, Reading.
Porter, M. (1980) "An algorithm for suffix stripping, Program",Vol. 14, No. 3, Pp. 130–137.
Salton, G. and Buckley, C. (1988) "Term weighting approaches In automatic text retrieval, Information Processing and Management",Vol. 24, No. 5, Pp. 513-523.
Karbasi, S. and Boughanem, M. (2006),"Document lengthnormalization using effective level of term frequency in largecollections", Advances in Information Retrieval, Lecture Notes in Computer Science, Springer Berlin / Heidelberg, Vol. 3936/2006, Pp. 72-83.
Diao, Q. and Diao, H. (2000) "Three Term Weighting and Classification Algorithms in Text Automatic Classification", The Fourth International Conference on High-Performance Computing in theAsia-Pacific Region,Vol. 2, P. 629.
Chisholm, E. and Kolda, T. F. (1998) "New term weighting Formulas for the vector space method in information retrieval",Technical Report, Oak Ridge National Laboratory.
C. Apte, F. Damerau and S. Weiss "Towards language independent automated learning of text categorization models". Proceeding of 17th Annual ACM/SIGIR conference,1994.
William W. Cohen and Yoram Singer, "Context sensitive learning methods for text categorization", In SIGIR'96: Proceeding of 19th Annual International ACM/SIGIR conference on research and development in information retrieval, 1996.
R. H. Creecy, B. M. Masand, S. J. Smith and D. L. Waltz, "Trading mips and memory for knowledge Engineering", classifying census returns on the connection machine comm. . ACM, 35:48-63,1992
N. Fuhr, S. Hartmanna, G. Lusting, M. Schwanter and K. Tzeras, " Rule based multistage indexing systems for large subject field", In 606-623, editor, Proceedings of RIAO'91.
D. Koller and M. Sahami," Toward optimal feature selection", In proceedings of the 13th international conference on machine learning 1996
D. D. Lewis and M. Ringvette, "Comparison of two learning algorithm for text categorization", In Proceeding Analysis and Information Retrieval(SDAIR'94) 1994.

Index Terms

Computer Science

Information Sciences

Keywords

Preprocessing Text Categorization