Signed-With-Weight Technique for Mining Web Content Outliers

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

Navigating the Future of Cybersecurity: A Strategic Approach to Crypto Agility for Modern Enterprises

Aditya Gupta

Random Articles

A Novel Method for Identifying the Drowsiness while Driving

December

2015

Thermo-Hydraulic Behavior of Water Cooling Channel Subjected to Constant Heat Flux during Pressure Reduction Transient in its Cooling System

October

2013

The Capacity of Mesh Cell based on Fixed WiMAX System

Jan

2018

An Ontology based Context-aware Application

October

2014

Reseach Article

Signed-With-Weight Technique for Mining Web Content Outliers

Published on February 2013 by S. Poonkuzhali, P. Sudhakar, K. Sarukesi

International Conference on Communication, Computing and Information Technology

Foundation of Computer Science USA

ICCCMIT - Number 2

February 2013

Authors: S. Poonkuzhali, P. Sudhakar, K. Sarukesi

1c379192-2ad1-4bb6-a3c8-0bb6fc7984b8

S. Poonkuzhali, P. Sudhakar, K. Sarukesi . Signed-With-Weight Technique for Mining Web Content Outliers. International Conference on Communication, Computing and Information Technology. ICCCMIT, 2 (February 2013), 40-45.

@article{

author = { S. Poonkuzhali, P. Sudhakar, K. Sarukesi },

title = { Signed-With-Weight Technique for Mining Web Content Outliers },

journal = { International Conference on Communication, Computing and Information Technology },

issue_date = { February 2013 },

volume = { ICCCMIT },

number = { 2 },

month = { February },

year = { 2013 },

issn = 0975-8887,

pages = { 40-45 },

numpages = 6,

url = { /specialissues/icccmit/number2/10336-1021/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Special Issue Article

%1 International Conference on Communication, Computing and Information Technology

%A S. Poonkuzhali

%A P. Sudhakar

%A K. Sarukesi

%T Signed-With-Weight Technique for Mining Web Content Outliers

%J International Conference on Communication, Computing and Information Technology

%@ 0975-8887

%V ICCCMIT

%N 2

%P 40-45

%D 2013

%I International Journal of Computer Applications

Abstract

Web outlier mining is dedicated for finding web pages which differ significantly from the rest of the web document taken from the same category. Most of the existing algorithms for web content outlier mining is developed for structured documents, whereas WWW contains mostly unstructured and semi structured documents. Moreover, the false positive rate in the existing algorithms for mining web content outlier is more than 30%. Therefore, there is need to develop a technique to mine web outliers from unstructured and semi structured document types with less false positive rate. This paper, concentrates on mining web content outliers which extracts the dissimilar web document taken from the group of documents of same domain. The proposed work implement a novel mathematical approach based on signed-with-weight technique for mining web content outliers which retrieves top n outlier web documents from both structured and unstructured web documents. The proven results show the performance measure of this approach in terms of precision and recall is more than 90%. Also, the false positive rate of this algorithm is less than 15%.

References

Ali S. Hadi,A. H. M. Rahmatullah Imon(2009), Mark Werner, Detection of outliers Overview, Wiley Interdisciplinary Reviews: Computational Statistics, Volume 1, Issue 1, pp-57-70.
Anguilli, F. , and Pizzuti, C. , Elomaa,T. (Eds. ). Fast Outlier Detection in High Dimensional Spaces. PKDD, LNAI 2431, 2002, pp 15-27
Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
Breunig, M. M. , Kriegel, H-P. , Ng R. T. , and Sander, J. LOF: Identifying Outliers in Large Dataset. Proc. of ACM SIGMOD 2000, Dallas, TX 2000.
Barnett, V. and Lewis, T. Outliers in Statistical Data. John Willey, 1994
G Poonkuzhali, K Thiagarajan and K Sarukesi, Set theoretical Approach for mining web content through outliers detection International journal on research and industrial applications, Vol. 2, 2009, pp. 131-138
G Poonkuzhali, K Thiagarajan, K Sarukesi and G V Uma, Signed approach for mining web content outliers. Proceedings of World Academy of Science, Engineering and Technology, Volume 56, 2009, pp -820-824.
G. Poonkuzhali, R. Kishore Kumar, R. Kripa Keshav and K. Sarukesi paper titled "Statistical Approach for Improving the Quality of Search Engine" " in the Book " RECENT RESEARCHES IN APPLIED COMPUTER AND APPLIED COMPUTATIONAL SCIENCE", included in ISI/SCI Web of Science and Web of Knowledge,Venice, Italy, 2011, pp-89-93.
Malik Agyemang, Ken Barker and Rada S. Alhajj, Framework for Mining Web Content Outliersb. In: ACM Symposium on Applied Computing, Nicosia, Cyprus, 2004, pp 590-594.
Malik Agyemang, Ken Barker, Reda Alhajj, Web outlier mining: Discovering outliers from web datasets, Intelligent Data Analysis,Vol. 9, No (5)/2005, pp 473-486
Malik Agyemang, Ken Barker and Rada S. Alhajj Mining Web Content Outliers using Structure Oriented Weighting Techniques and N-Grams' ACM Symposium on Applied Computing. , Santa Fe, New Mexico,2005, pp 482-487.
Malik Agyemang Ken Barker and Rada S. Alhajj WCOND âMine : Algorithm for detecting Web Content Outliers from Web Documents. IEEE Symposium on Computers and Communication. 2005.
Malik Agyemang Ken Barker and Rada S. Alhajj, Hybrid Approach to Web Content Outlier Mining without Query Vector. Springer âBerlin, 2005,Vol. 3589.
Malik Agyemang, Ken Barker, Reda Alhajj, A comprehensive survey of numeric and symbolic outlier mining techniques, Intelligent Data Analysis,Vol. 10, No (6)/2006, pp 521-538.
Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 â 138.
Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey, ACM SIGKDD, July 2000, Vol-2, pp 1-15.
Xia Huosong, Fan Zhaoyan, Peng Liuyan, "Chinese Web Text Outlier Mining Based on Domain Knowledge," Intelligent Systems, WRI Global Congress on, vol. 2, pp. 73-77, 2010 Second WRI Global Congress on Intelligent Systems, 2010

Index Terms

Computer Science

Information Sciences

Keywords

Dissimilarity Weight Outlier Mining Term Frequency Weighted Approach Web Content Mining Web Content Outliers