![]() |
10.5120/ijca2017914443 |
Shruti Luthra, Dinkar Arora, Kanika Mittal and Anusha Chhabra. A Statistical Approach of Keyword Extraction for Efficient Retrieval. International Journal of Computer Applications 168(7):31-36, June 2017. BibTeX
@article{10.5120/ijca2017914443, author = {Shruti Luthra and Dinkar Arora and Kanika Mittal and Anusha Chhabra}, title = {A Statistical Approach of Keyword Extraction for Efficient Retrieval}, journal = {International Journal of Computer Applications}, issue_date = {June 2017}, volume = {168}, number = {7}, month = {Jun}, year = {2017}, issn = {0975-8887}, pages = {31-36}, numpages = {6}, url = {http://www.ijcaonline.org/archives/volume168/number7/27889-2017914443}, doi = {10.5120/ijca2017914443}, publisher = {Foundation of Computer Science (FCS), NY, USA}, address = {New York, USA} }
Abstract
Large number of techniques for keyword extraction have been proposed for better matching of documents with the user’s query but most of them deal with tf-idf to find the weight age of query terms in the entire document but this can result in improper result as if a term has a low term frequency in overall document but high frequency in a certain part of the document then that term can be ignored by traditional tf-idf method. Through this paper, the keyword extraction is improved using a hybrid technique in which the entire document is split into multiple domains using a master keyword and the frequency of all unique words is found in every domain . The words having high frequency are selected as candidate keywords and the final selection is made on the basis of a graph which is constructed between the keywords using Word Net. The experiments, conducted on various documents show that proposed approach outperforms other keyword extraction methodologies by enhancing document retrieval.
References
- Information Retrieval Research, Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies, The Robert Gordon University, Aberdeen, Scotland. (Eds)
- Important problems in information retrieval, Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742
- "Keyword extraction-a review of methods and approaches" Slobodan Beliga University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia
- Effective Approaches For Extraction Of Keywords Jasmeen Kaur, Vishal Gupta, ME Research Scholar Computer Science & Engineering, UIET, Panjab University Chandigarh, (UT)-160014
- Understanding Inverse Document Frequency: On theoretical arguments for IDF, Stephen Robertson Microsoft Research 7 JJ Thomson Avenue Cambridge CB3 0FB UK
- Keyword Extraction using graph based approaches, R. Nagarajan, Dr. S. Anu H Nair, Dr. P. Aruna, N. Puviarasan Department of Computer Science & Engineering, Annamalai University, Tamilnadu, India
- Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
- Cohen J. D., “Highlights: Language and Domain- independent Automatic Indexing Terms for Abstracting”,Journal of the American Society for Information Science, 46(3): 162 – 174, 1995
- Mihalcea R and Tarau P, “Textrank: Bringing order into texts”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004
- Jasmeen and Vishal,"Effective approaches for extraction of keywords", IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814
- Hulth A., “Improved automatic keyword extraction given more linguistic knowledge”, In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP'03), 216 – 223, Sapporo, 2003
- Hulth A, “Combining machine learning and natural language processing for automatic keyword extraction”,PhD Thesis, Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences, 2004
- Whitney P, Engel D and Cramer N, “Mining for surprise events within text streams”. Proceedings of the NinthSIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 617–627, 2009
- Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
- I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, C. G. Nevill-Manning, “Kea: Pra-ctical Automatic Keyphrase Extraction” inProc. of the 4th ACM Conf. of the Digital Libraries, Berkeley, CA, USA, 1999.
- P. D. Turney, “Learning to Extract Keyphrases from Text” in Tech. Report, National Research Council of Canada, Institute for Information Technology, 1999.
- T. D. Nguyen, M.-Y. Kan, „Keyphrase extraction in scientific publications“ in Proc. of ICADL 2007, pp. 317-326, 2007.
- M. Krapivin, A. Autayeu, M. Marchese, E. Blanzieri, N. Segata, “Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing” in Proc. of 12th Int. Conf. on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia, LNAI v.6102, pp. 102-111, 2010
- Y. HaCohen-Kerner, “Automatic Extraction of Keywords from Abstracts” in Proc. of 7th Int. Conf. KES 2003 (LNCS v. 2773), pp, 843-849, 2003.
- M. Litvak, M. Last, “Graph-based keyword extraction for single-document summarization” in ACM Workshop on Multi-source Multilingual Information Extraction and Summarization, pp.17-24, 2008.
- Z. Yang, J. Lei, K. Fan, Y. Lai, “Keyword extraction by entropy difference between the intrinsic and extrinsic mode” in Physica A: Statistical Mechanics and its Applications, V. 392, I. 19, pp. 4523-4531, 2013.
- Slobodan beliga, University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia,"Keyword extraction a review of method and approaches"
- Y Matsuo," Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information",International Journal on Artificial Intelligence Tools c World Scientific Publishing Company
- "Domain keyword extraction technique: A new weighting method based on frequency analysis" Rakhi Chakraborty ,Department of Computer Science & Engineering, Global Institute Of Management and Technology, Nadia, India
- Willett, P. (2006) The Porter stemming algorithm: then and now. Program: electronic library and information systems, 40 (3). pp. 219-223.
Keywords
Information Retrieval, Domain Splitting, Natural Language Processing, Inverse Document Frequency, Word Net