Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

Approach for Dimensionality Reduction in Web Page Classification

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 99 - Number 14
Year of Publication: 2014
Shraddha Sarode
Jayant Gadge

Shraddha Sarode and Jayant Gadge. Article: Approach for Dimensionality Reduction in Web Page Classification. International Journal of Computer Applications 99(14):32-37, August 2014. Full text available. BibTeX

	author = {Shraddha Sarode and Jayant Gadge},
	title = {Article: Approach for Dimensionality Reduction in Web Page Classification},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {99},
	number = {14},
	pages = {32-37},
	month = {August},
	note = {Full text available}


Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant terms have to be removed from web pages. This research describes hybrid approach for dimensionality reduction in web page classification using a rough set and naïve Bayesian method. Feature selection and dimensionality reduction methods are used for reducing the dimensionality. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Naïve Bayesian method is used for classifying web pages to optimal predefined categories. Assignment of web pages to category is based on maximum posterior probability. Words remaining after the process of feature selection and dimensionality reduction will be given to the classifier. Finally the classifier will assign most optimal predefined category to web pages.


  • Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd Ed. , Han, Kamber & Pei, University of Illinois at Urbana-Champaign &Simon Fraser University, 2011
  • Ming Mao, Yefei Peng, Michael Spring, "Ontology Mapping: As a Binary Classification Problem", IEEE Fourth international conference on Semantics, Knowledge and grid, 2008
  • Xiaoguang Qi and Brian D. Davison, "Web Page Classification: Features and Algorithms", ACM Computing Surveys, Vol. 41, No. 2, Article 12, Publication date: February 2009.
  • Tom M. Mitchell, "Machine Learning," Carnegie Mellon University, McGraw-Hill Book Co, 1997.
  • Juan Zhang, Yi Niu, Huabei Nie, "Web Document Classification Based on Fuzzy k-NN Algorithm", International Conference on Computational Intelligence and Security, IEEE, 2009.
  • Rung-Ching Chen *, Chung-Hsun Hsieh, "Web page classification based on a support vector machine using a weighted vote schema", Expert Systems with Applications 31, Elsevier, 2006.
  • Xiaoyue Wang, Zhen Hua, Rujiang Bai. "A Hybrid Text Classification model based on Rough Sets and Genetic Algorithms" Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, IEEE, 2012.
  • G. S. Tomar, Shekhar Verma, Ashish Jha, "Web Page Classification using Modified Naïve Bayesian Approach", IEEE, 2006.
  • Selma Ayse Özel, "A Genetic Algorithm Based Optimal Feature Selection for Web Page Classification", IEEE, 2011.
  • Tseng, V. S. ; Ja-Hwung Su; Hao-Hua Ku; Bo-Wen Wang;" Intelligent Concept-Oriented and Content-Based Image Retrieval by using data mining and query decomposition techniques" IEEE International Conference on Multimedia and Expo. June 23 2008-April 26 2008 Page(s):1273 – 1276
  • Yang, Yiming, and Jan O. Pedersen. "A comparative study on feature selection in text categorization. " In ICML, vol. 97, pp. 412-420. 1997.
  • Pawlak, Zdzis?aw. "Rough sets. " International Journal of Computer & Information Sciences 11. 5 (1982): 341-356.
  • C. Velayutham and K. Thangavel, "Improved Rough Set Algorithms for Optimal Attribute Reduct", Journal of Electronic Science and Technology, VOL. 9, NO. 2, June 2011
  • Ramez Elmasri, Shamkant B. Navathe, "Fundamentals of Database Systems,"Addison Wesley Longman Publishing Co. , Fifth Edition, 2007.
  • Sang-Bum Kim, Kyong-soo Han, Hae-Chang Rim, Sung Hyon Myaeng "Some Effective techniques for Naïve Bayes Text Classification" IEEE Transactions on Knowledge and Data Engineering -2006
  • Vidhya. K. A, and G. Aghila, "Hybrid Text Mining Model for Document Classification", The 2nd International Conference on Computer and Automation Engineering (ICCAE), 2010.
  • Dino Isa, Lam Hong Lee, V. P. Kallimani, and R. RajKumar, " Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine", IEEE transactions on knowledge and data engineering, vol. 20, no. 9, September 2008
  • Franca Debole & Fabrizio Sebastiani, "An Analysis of the Relative Hardness of Reuters-21578 Subsets", Technical report, Institute of Science and Technologies of the National Research Council Via Giuseppe Moruzzi, Pisa, Italy, 2003