Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

An Ontology-based Summarization System for Arabic Documents (OSSAD)

Print
PDF
International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 74 - Number 17
Year of Publication: 2013
Authors:
Ibrahim Imam
Nihal Nounou
Alaa Hamouda
Hebat Allah Abdul Khalek
10.5120/12980-0237

Ibrahim Imam, Nihal Nounou, Alaa Hamouda and Hebat Allah Abdul Khalek. Article: An Ontology-based Summarization System for Arabic Documents (OSSAD). International Journal of Computer Applications 74(17):38-43, July 2013. Full text available. BibTeX

@article{key:article,
	author = {Ibrahim Imam and Nihal Nounou and Alaa Hamouda and Hebat Allah Abdul Khalek},
	title = {Article: An Ontology-based Summarization System for Arabic Documents (OSSAD)},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {74},
	number = {17},
	pages = {38-43},
	month = {July},
	note = {Full text available}
}

Abstract

With the problem of increased web resources and the huge amount of information available, the necessity of having automatic summarization systems appeared. Since summarization is needed the most in the process of searching for information on the web, where the user aims at a certain domain of interest according to his query, domain-based summaries would serve the best. Despite the existence of plenty of research work in the domain-based summarization in English, there is lack of them in Arabic due to the shortage of existing knowledge bases. In this paper an Ontology-based Summarization System for Arabic Documents, OSSAD, is introduced. Domain knowledge is extracted from an Arabic corpus and represented by topic related concepts/keywords and the lexical relations among them. The user's query is first expanded by using the Arabic WordNet and then by adding the domain-specific knowledge base to the expansion. For summarization, decision tree algorithm (C4. 5) is used, which was trained by a set of features extracted from the original documents. For the testing dataset, Essex Arabic Summaries Corpus (EASC) was used. Recall Oriented Understudy for Gisting Evaluation (ROUGE) was used to compare OSSAD summaries with the human summaries along with other automatic summarization systems, showing that the proposed approach demonstrated promising results.

References

  • Dragomir R. Radev, Kathleen McKeown, "Introduction to the Special Issue on Summarization", Computational Linguistics – Summarization, Vol 28, No. 4, pp. 399-408, 2002.
  • Rakesh Verma, Ping Chen, Wei Lu, "A Semantic Free-text Summarization System Using Ontology Knowledge", IEEE Transactions on Information Technology in Biomedicine, Vol 5, No. 4, pp. 261-270, 2007.
  • Kamal Sarkar, "Using Domain Knowledge for Text Summarization in Medical Domain", International Journal of Recent Trends in Engineering, Vol 1, No. 1, pp. 200-205, 2009.
  • Vivi Nastase, "Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation", conference on Empirical Methods in Natural Language Processing, Waikiki, Honolulu, Hawaii, 2008.
  • A. A. Kogilavani, B. Dr. P. Balasubramanie, "Ontology Enhanced Clustering Based Summarization of Medical Documents", International Journal of Recent Trends in Engineering, Vol 1, No. 1, pp. 546-549, 2009.
  • Ping Chen, Rakesh Verma, "A Query-based Medical Information Summarization System Using Ontology Knowledge", Computer-based Medical Systems (CBMS), 19th IEEE International Symposium, USA, pp. 37 – 42, 2006.
  • Chia-Wei Wu, Chao-Lin Liu, "Ontology-based Text Summarization for Business News Articles", ISCA 18th International Conference on Computers and Their Applications, Honolulu, Hawaii, USA, pp. 389-392, 2003.
  • Paul Buitelaar, Philipp Cimiano, Bernardo Magnini, Ontology Learning from Text: Methods, Application and Evaluation, IOS Press, 2003.
  • Ivan Bedini, Benjamin Nguyen, "Automatic Ontology Generation: State of the Art", Molecular Evolution, Vol 44, No. 2, pp. 226-233, 1997.
  • Maryam Hazman, Samhaa R El-Beltagy, Ahmed Rafea, "A Survey of Ontology Learning Approaches",Vol 22, No. 9, pp. 36-43, 2011.
  • Elena Demidova, Iryna Oelze, "Automatic Keyword Extraction for Database Search", PhD thesis, University of Hannover, 2009.
  • Philipp Cimiano, Aleksander Pivk, Lars Schmidt-Thieme, Steffen Staab, "Learning Taxonomic Relations from Heterogeneous Evidence", In: Ontology Learning from Text: Methods, Applications and Evaluation, pp. 59-73, IOS Press, 2005.
  • Wikipedia, http://en. wikipedia. org/wiki/Formal_concept_analysis, (10-01-2013).
  • Wikipedia, http://en. wikipedia. org/wiki/WordNet, (10-01-2013).
  • William BLACK, Sabri ELKATEB, "Introducing the Arabic WordNet Project", Third International WordNet Conference (GWC-06), Korea, 2006.
  • The Stanford Natural Language Processing Group, http://nlp. stanford. edu/software/tagger. shtml, (14-01-2013).
  • Xing Jiang, Ah-Hwee Tan, "Mining Ontological Knowledge from Domain-Specific Text Documents", Data Mining, Fifth IEEE International Conference, Singapore, 2005.
  • Euthymios Drymonas, "Exploring multi-word similarity measures for Information Retrieval applications: the T-SRM method", PhD thesis, Technical University of Crete (TUC), Department of Electronics and Computer Engineering, 2006.
  • Sophia Ananiadou, Hideki Mima, "An Application and Evaluation of the C/NC-value Approach for the Automatic term Recognition of Multi-Word units in Japanese", International Journal of Terminology, Vol 6, No. 2, pp. 175–194, 2000.
  • Ahmed Cherif Mazari, Hassina Aliane, Zaia Alimazighi. "Automatic construction of ontology from Arabic texts", ICWIT, Vol 867, pp. 193-202. 2012.
  • Mohammed Attia, Antonio Toral, Lamia Tounsi, Pavel Pecina, "Automatic Extraction of Arabic Multiword Expressions", the 7th Conference on Language Resources and Evaluation (LREC), 2010.
  • Katerina Frantzi, Sophia Ananiadou, Hideki Mima, "Automatic recognition of multi-word terms: the C-value/NC-value method", International Journal on Digital Libraries, Vol. 3, No. 2, pp. 115-130, 2000.
  • Philipp Cimiano, Johanna Völker, "Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery", 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Spain, pp. 227-238, 2005.
  • Mahmoud O. EL-HAJ, Bassam H. HAMMO, "Evaluation of Query-Based Arabic Text Summarization System", Natural Language Processing and Knowledge engineering International Conference, IEEE, Jordan, pp. 1-7, 2008.
  • Mahmoud El-Haj, Udo Kruschwitz, Chris Fox, "Multi-Document Arabic Text Summarization", Computer Science and Electronic Engineering Conference (CEEC), IEEE, UK, pp. 40 – 44, 2011.
  • Summarisation Corpora, http://privatewww. essex. ac. uk/~melhaj/easc. htm, (14-01-2013).
  • PCMAG. com, http://www. pcmag. com/encyclopedia_term/0,1237,t=Mechanical+Turk&i=57289,00. asp, (14-01-2013).
  • ROUGE, http://www. berouge. com/Pages/DownloadROUGE. aspx, (14-01-2013).
  • Kavita Ganesan, ChengXiang Zhai, Jiawei Han, "Opinosis: A Graph-Based Approach to Abstractive Summarization of Highly Redundant Opinions", the 23rd International Conference on Computational Linguistics (COLING '10), China, 2010.
  • Jonas Sjobergh, "Older versions of the ROUGEeval summarization evaluation system were easier to fool", the International Journal of Information Processing and Management, Vol. 43, No. 6, pp. 1500-1505, 2007.
  • Mahmoud El-Haj, Udo Kruschwitz, Chris Fox, "Using Mechanical Turk to Create a Corpus of Arabic Summaries", the Seventh conference on International Language Resources and Evaluation, Valletta, Malta, 2010.