CFP last date
20 May 2024
Reseach Article

An Advanced Fuzzy Constructing Algorithm for Feature Discovery in Text Mining

by Evana Ramalakshmi, Subhakar Golla
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 127 - Number 17
Year of Publication: 2015
Authors: Evana Ramalakshmi, Subhakar Golla
10.5120/ijca2015906720

Evana Ramalakshmi, Subhakar Golla . An Advanced Fuzzy Constructing Algorithm for Feature Discovery in Text Mining. International Journal of Computer Applications. 127, 17 ( October 2015), 30-34. DOI=10.5120/ijca2015906720

@article{ 10.5120/ijca2015906720,
author = { Evana Ramalakshmi, Subhakar Golla },
title = { An Advanced Fuzzy Constructing Algorithm for Feature Discovery in Text Mining },
journal = { International Journal of Computer Applications },
issue_date = { October 2015 },
volume = { 127 },
number = { 17 },
month = { October },
year = { 2015 },
issn = { 0975-8887 },
pages = { 30-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume127/number17/22823-2015906720/ },
doi = { 10.5120/ijca2015906720 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:18:41.346793+05:30
%A Evana Ramalakshmi
%A Subhakar Golla
%T An Advanced Fuzzy Constructing Algorithm for Feature Discovery in Text Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 127
%N 17
%P 30-34
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

It is a big task to provide the accuracy of discovered relevance features in text documents for describing user requirements. Classification of data is biggest issue in more text documents because they have large number of words and data patterns. Most existing popular methods are used by word-based approaches. Still, they have all suffered from the problems of relevance and uncertainty. Over the years, there has been pattern-based methods should perform better result than word-based methods in describing user requirements. But, how to effectively use large scale patterns remains a typical problem in text mining. To overcome this problem, Fuzzy Relevance Feature Discovery Algorithm (FRFDA), classification techniques have been developed for relevance feature discovery. It describes both higher level and low level features based on word patterns. It is also classifies words into categories and updates those word weights based on their relevance and dispensation in patterns. The experimentation result proves that, the proposed FRFDA is better than existing manual and automation methods. The data set Reuters-21578 shows that the proposed model significantly outperforms faster and obtains better extracted features than other methods.

References
  1. H. Li, T. Jiang, and K. Zang, “Efficient and Robust Feature Extraction by Maximum Margin Criterion,” T. Sebastian, S. Lawrence, and S. Bernhard eds. Advances in Neural Information Processing System, pp. 97-104, Springer, 2004.
  2. Datasets for single-label text categorizatio. Http://web.ist.utl.pt/~acardoso/data sets/, 2010.
  3. D.D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A New Benchmark Collection for Text Categorization Research,” J. Machine Learning Research, vol. 5, pp. 361-397, http:// www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf, 2004.
  4. N. Slonim and N. Tishby, “The Power of Word Clusters for Text Classification,” Proc. 23rd European Colloquium on Information Retrieval Research (ECIR), 2001.
  5. M.C. Dalmau and O.W.M. Flo´ rez, “Experimental Results of the Signal Processing Approach to Distributional Clustering of Terms on Reuters-21578 Collection,” Proc. 29th European Conf. IR Research, pp. 678-681, 2007.
  6. X. Wang, H. Fang, and C. Zhai, “A study of methods for negative relevance feedback,” in Proc. Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 219–226.
  7. Rao, Gudikandhula Narasimha, and P. Jagdeeswar Rao. "A Clustering Analysis for Heart Failure Alert System Using RFID and GPS." ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India-Vol I. Springer International Publishing, 2014.
  8. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” in J. Mach. Learn. Res., vol. 3, no. 1, pp. 1157–1182, 2013.
  9. G. Narasimha Rao, R. Ramesh, D. Rajesh, D. Chandra sekhar."An Automated Advanced Clustering Algorithm For Text Classification". In International Journal of Computer Science and Technology, vol 3,issue 2-4, June, 2012, eISSN : 0976 - 8491,pISSN : 2229 – 4333.
  10. Y. Li, X. Zhou, P. Bruza, Y. Xu, and R. Y. Lau, “A two-stage text mining model for information filtering,” in Proc. 17th ACM Conf. Inf. Knowl. Manage., 2008, pp. 1023–1032.
  11. C. D. Manning and H. Sch€utze, Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press, 1999.
  12. S. E. Robertson and I. Soboroff, “The TREC 2002 filtering track report,” in Proc. 11th Text Retrieval Conf., 2002.
  13. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surveys, vol. 34, no. 1, pp. 1–47, 2002.
  14. S. Shehata, F. Karray, and M. Kamel, “A concept-based model for enhancing text categorization,” in Proc. ACM SIGKDD Knowl. Discovery Data Mining, 2007, pp. 629–637.
  15. Q. Song, J. Ni, and G. Wang, “A fast clustering-based feature subset selection algorithm for high-dimensional data,” in IEEE Trans. Knowl. Data Eng., vol. 25, no. 1, pp. 1–14, Jan. 2013.
  16. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003.
  17. I. Guyon, C. Aliferis, and A. Elisseeff, “Causal feature selection,” in Computational Methods of Feature Selection Data Mining and Knowledge Discovery Series, Boca Raton, FL, USA: CRC, 2007 pp. 63–85.
  18. A. Nanopoulos, R. Alcock, and Y. Manolopoulos, “Feature-based classification of time-series data,” in Information Processing and Technology, Commack, NY, USA: Nova, 2001 pp. 49–61.
  19. C. A. Ratanamahatana and E. Keogh, “Making time-series classification more accurate using learned constraints,” in Proc. SIAM Int. Conf. Data Mining, 2004, pp. 11–22.
  20. K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally adaptive dimensionality reduction for indexing large time series databases,” ACM Trans. Database Syst., vol. 27, pp. 188–228, 2002.
Index Terms

Computer Science
Information Sciences

Keywords

Text mining fuzzy similarity feature clustering text feature extraction text classification Fuzzy Relevance Feature Discovery (FRFD) Reuters Corpus Volume (RCV).