CFP last date
20 May 2024
Reseach Article

Efficient Conceptual Rule Mining on Text Clusters in Web Documents

by V. M. Navaneethakumar, C. Chandrasekar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 56 - Number 3
Year of Publication: 2012
Authors: V. M. Navaneethakumar, C. Chandrasekar
10.5120/8870-2844

V. M. Navaneethakumar, C. Chandrasekar . Efficient Conceptual Rule Mining on Text Clusters in Web Documents. International Journal of Computer Applications. 56, 3 ( October 2012), 11-16. DOI=10.5120/8870-2844

@article{ 10.5120/8870-2844,
author = { V. M. Navaneethakumar, C. Chandrasekar },
title = { Efficient Conceptual Rule Mining on Text Clusters in Web Documents },
journal = { International Journal of Computer Applications },
issue_date = { October 2012 },
volume = { 56 },
number = { 3 },
month = { October },
year = { 2012 },
issn = { 0975-8887 },
pages = { 11-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume56/number3/8870-2844/ },
doi = { 10.5120/8870-2844 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:57:54.041432+05:30
%A V. M. Navaneethakumar
%A C. Chandrasekar
%T Efficient Conceptual Rule Mining on Text Clusters in Web Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 56
%N 3
%P 11-16
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text mining is a modern and computational approach attempts to determine new, formerly unidentified information by pertaining techniques from normal language processing and data mining. Clustering, one of the conventional data mining techniques is an unsubstantiated learning pattern where clustering techniques attempt to recognize intrinsic groupings of the text documents, so that a set of clusters is formed in which clusters reveal high intra-cluster comparison and low inter-cluster similarity. Most current document clustering methods are based on the Vector Space Model (VSM), which is a widely used data representation for text classification and clustering. Moreover, weighting these features accurately also affects the result of the clustering algorithm substantially. The previous work described the conceptual text clustering to web documents, containing various mark up language formats associated with the documents (term extraction mode). In this work, we are going to present a Conceptual rule mining which is generated for the sentence meaning and related sentences in the document. Weights are appropriated for the sentences having higher contribution to the topic of the document. Conditional probability is evaluated for the sentence weights. Probability ratio is identified for the sentence similarity from which unique sentence meaning contributing to the document topic are listed. Experiments are conducted with the web documents extracted from the research repositories to evaluate the efficiency of the proposed efficient conceptual rule mining on text clusters in web documents and compared with an existing Model for Concept Based Clustering and Classification in terms of Topic related rules, Weights of the influential sentence, Topic Sensitivity.

References
  1. SaiSindhu Bandaru ET. AL. , "An Efficient Semantic Model For Concept Based Clustering And Classification", International Journal on Computer Science and Engineering (IJCSE), ISSN : 0975-3397 Vol. 4 No. 03 March 2012.
  2. Shady Shehata, , Fakhri Karray, and Mohamed S. Kamel, "An Efficient Concept-Based Mining Model for Enhancing Text Clustering," IEEE Transactions on Knowledge and Data Engineering , vol. 22, no. 10, October 2010.
  3. Sotiris Kotsiantis, Dimitris Kanellopoulos , "Association Rules Mining: A Recent Overview", GESTS International Transactions on Computer Science and Engineering, Vol. 32 (1), 2006, pp. 71-82
  4. Vishal Gupta et. Al. , "A Survey of Text Summarization Extractive Techniques", JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO. 3, AUGUST 2010
  5. Claude Pasquier et. Al. , "Task 5: Single document keyphrase extraction using sentence clustering and Latent Dirichlet Allocation", Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010, pages 154–157
  6. Hany Mahgoub et. Al. , "A Text Mining Technique Using Association Rules Extraction", International Journal of Information and Mathematical Sciences 4:1 2008
  7. Kjetil Nørv?ag et. Al. , "Semantic-Based Temporal Text-Rule Mining", Proceeding on 10th International Conference on Computational Linguistics and Intelligent text processing, CICLing '09, pages 442-455.
  8. K. Nørv?ag, K. -I. Skogstad, and T. Eriksen. Mining association rules in temporal document collections. In Proceedings of the 16th International Symposium on Methodologies for Intelligent Systems (ISMIS'06), 2006
  9. Karel Jezek and Josef Steinberger, "Automatic Text summarization", Vaclav Snasel (Ed. ): Znalosti 2008, pp. 1-12, ISBN 978-80-227-2827-0, FIIT STU Brarislava, Ustav Informatiky a softveroveho inzinierstva, 2008.
  10. Farshad Kyoomarsi, et. Al. , "Optimizing Text Summarization Based on Fuzzy Logic", In proceedings of Seventh IEEE/ACIS International Conference on Computer and Information Science, IEEE, University of Shahid Bahonar Kerman, UK, 347-352, 2008
  11. Yongzheng, Nur and Evangelos, "Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora", WIDM'5, 51-57, Bremen Germany,2005
  12. Rene Arnulfo Garcia-Herandez et. Al. , "Word Sequence Models for Single Text Summarization", IEEE,44-48, 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Conceptual rule mining text clustering conditional probability probability ratio