An Efficient Method of Web Page Noise Cleaning for Effective Web Mining

S. S. Bhamare; B. V. Pawar

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

An Efficient Method of Web Page Noise Cleaning for Effective Web Mining

by S. S. Bhamare, B. V. Pawar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 146 - Number 3

Year of Publication: 2016

Authors: S. S. Bhamare, B. V. Pawar

10.5120/ijca2016910657

S. S. Bhamare, B. V. Pawar . An Efficient Method of Web Page Noise Cleaning for Effective Web Mining. International Journal of Computer Applications. 146, 3 ( Jul 2016), 18-22. DOI=10.5120/ijca2016910657

@article{ 10.5120/ijca2016910657,

author = { S. S. Bhamare, B. V. Pawar },

title = { An Efficient Method of Web Page Noise Cleaning for Effective Web Mining },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2016 },

volume = { 146 },

number = { 3 },

month = { Jul },

year = { 2016 },

issn = { 0975-8887 },

pages = { 18-22 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume146/number3/25378-2016910657/ },

doi = { 10.5120/ijca2016910657 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:49:18.348442+05:30

%A S. S. Bhamare

%A B. V. Pawar

%T An Efficient Method of Web Page Noise Cleaning for Effective Web Mining

%J International Journal of Computer Applications

%@ 0975-8887

%V 146

%N 3

%P 18-22

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In the huge network of World Wide Web, web pages contained large amount of information. Web researches are always requiring main content (e.g., an article text) from the web pages to be gathered, processed and stored quickly and efficiently. Mining the data on the Web has become a major task for locating useful information from the Web. The Web information‘s that are considered as useful information usually has huge amounts of noise data‘s such as navigation bars, links, advertisements, copyright notices etc. Performance of Web mining can be improved by identifying and removing noises from Web pages. In this paper new method is proposed for removing noise content tag and extracts the information of main content tag from web pages.

References

R. Kosala and H. Blockheel. Web Mining Research: A Survey. In SIGKDD Explorations, Vol. 2, No. 1, pp 1-15, 2000.
Bing Liu, Web Data Mining (Exploring Hyperlinks, Contents, and Usage Data), Springer.
L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of the International ACM Conference on Knowledge Discovery and Data Mining, pages 296–305, 2003.
Hu Fei, Yang Huaqian, Wei Pengcheng, Pu Changjiu, Lei Yang, Web Page Noise Reduction Algorithm Using Non-template Approach in International Journal of Digital Content Technology and its Applications(JDCTA)Volume6, Number20, November 2012
Kushmerick, 1999] Nicholas Kushmerick. Learning to remove Internet advertisements. Agnets-1999, 1999.
Kao et al., 2002] Hung-Yu Kao, Ming-Syan Chen Shian-Hua Lin, and Jan-Ming Ho, Entropy-Based Link Analysis for Mining Web Informative Structures. CIKM-2002, 2002.
H. Y. Kao, J. M. Ho, and M. S. Chen, Wisdom Web intrapage informative structure mining based on document object model in IEEE Trans KDD, 2005.
Diao, Y., Lu, H., Chen, S., and Tian, Z., TowardLearningBased Web Query Processing, In Proceedings of International Conference on Very Large Databases, 2000, pp. 317-328.
Kaasinen, E., Aaltonen, M., Kolari, J., Melakoski, S., and Laakko, T., Two Approaches to Bringing Internet Services to WAP Devices, In Proceedings of 9th International World-Wide Web Conference, 2000, pp. 231-246.
Wong, W. and Fu, A. W., Finding Structure and Characteristics of Web Documents for Classification, In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), Dallas, TX., USA, 2000.
S. S. Bhamare, Dr. B. V. Pawar “Survey on Web Page Noise Cleaning for Web Mining” in International Journal of Computer Science and Information Technologies (IJCSIT) Volume 4 Issue 6, Nov-Dec. 2013, ISSN: 0975-9646.
The HTML DOM Parser Library Version 2.0, [Online] Available: http://thehtmldom.sourceforge.net
Dandan Song, Fei Sun, Lejian Liao.‖ A hybrid approach for content extraction with text density and visual importance of DOM nodes‖. In the proceedings of Springer Knowl Inf Syst, DOI 10.1007/s10115-013-0687-x, Verlag London 2013.
YI L. et LIU B. (2003), “Web Page Cleaning for Web Mining through Feature Weighting”, in Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03).
A. Rahman, H. Alam, and R. Hartono. Content extraction from html documents. In 1st Int. Workshop on Web Document Analysis (WDA2001).
B.D. Davision. Recognizing Nepotistic links on the Web. Proceeding of AAAI 2000.
Hu Fei, Li Ming, Ma Yan” Eliminating Noisy Information in Web Pages based on Source Code Shrinking”, International Journal of Advancements in Computing Technology (IJACT), Vol.4, No. 18, October 2012.

Index Terms

Computer Science

Information Sciences

Keywords

WPNC Noise Block HTML Tag White Listed tags HDT LDT Black Listed Tags.