CFP last date
22 April 2024
Reseach Article

Toward an ARABIC Stop-Words List Generation

by A. Alajmi, E. M. Saad, R. R. Darwish
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 8
Year of Publication: 2012
Authors: A. Alajmi, E. M. Saad, R. R. Darwish
10.5120/6926-9341

A. Alajmi, E. M. Saad, R. R. Darwish . Toward an ARABIC Stop-Words List Generation. International Journal of Computer Applications. 46, 8 ( May 2012), 8-13. DOI=10.5120/6926-9341

@article{ 10.5120/6926-9341,
author = { A. Alajmi, E. M. Saad, R. R. Darwish },
title = { Toward an ARABIC Stop-Words List Generation },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 8 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 8-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number8/6926-9341/ },
doi = { 10.5120/6926-9341 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:39:12.622984+05:30
%A A. Alajmi
%A E. M. Saad
%A R. R. Darwish
%T Toward an ARABIC Stop-Words List Generation
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 8
%P 8-13
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Over the past decades systems for automatic management of electronic documents have been one of the main fields of research. Text processing is a wide area that includes many important disciplines. In the processes of organizing unstructured text in order to implement a mining technique, preprocessing has to be applied. One of the most important preprocessing techniques is the removal of functional words which affects the performance of text mining tasks. In this paper, a statistical approach is presented to extract Arabic stop-words list. The extracted list was compared to a general list. The comparison yield an improvement in an ANN based classifier using the generated stop-words list over the general list.

References
  1. R. Feldman, and J. Sanger, "The text mining handbook", Cambridge university press, 2007.
  2. R. Nisbet, J. elder, G. Miner, "Handbook of statistical analysis and data mining applications", academic Press, Elsevier, 2009.
  3. M. Khosrow, "Encyclopedia of Information Science and Technology", Information Sci, Second Edition, 2009.
  4. B. Alhadidi, and M. Alwedyan,"Hybrid Stop-Word Removal Technique for Arabic Language", Egyptian Computer Science Journal Vol. 30 No. 1 January 2008.
  5. C. D. Manning, P. Raghavan, and H. Schutze, "Introduction to Information Retrieval", Cambridge university press, 2008.
  6. R. B. Myerson, "Fundamentals of social choice theory", Discussion Paper No. 1162, 1996.
  7. Z. Yao, and C. Ze-wen, "Research on the construction and filter method of stop-word list in text Preprocessing", Fourth International Conference on Intelligent Computation Technology and Automation, 2011.
  8. J. Savoy, "A Stemming Procedure And Stopword List For General French Corpora", Journal of the American Society for Information Science, 50(10), 1999, 944-952.
  9. L. Hao, and L. Hao, "Automatic Identification of StopWords in Chinese Text Classification", International Conference on Computer Science and Software Engineering,2008.
  10. G. Zheng, and G. gaowa, "The Selection of Mongolian Stop Words", IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), 2010.
  11. F. Zou, F. L. Wang, X. Deng, S. Han, and L. S. Wang, "Automatic Construction of Chinese Stop Word List", Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp1010-1015).
  12. I. A. El-Khair, "Effect of Stop Words Elimination for Arabic Information Retrieval: A comparative Study" , International journal of Computing & Information Sciences, Vol. 4, No. 3, December 2006.
  13. Y. Kadri; J. Y. Nie, "Effective Stemming for Arabic Information Retrieval," International conference at the British Computer Society, London, 23 October 2006; pp. 68-74.
  14. M. P. Sinka, and D. W. Corne, "Towards Modernised and Web-Specific Stoplists for Web Document Analysis", Proceedings of the IEEE/WIC International Conference on Web Intelligence (WI'03) ,2003. A. Alajmi, E. Saad, and M. Awadallah, "Arabic Verb Pattern Extraction", 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010)
Index Terms

Computer Science
Information Sciences

Keywords

Arabic Text Processing Stop-word List Generation