CFP last date
22 April 2024
Reseach Article

Finding Aliases using Noun Detection Algorithm

by Anuja Digambar Bharate
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 129 - Number 17
Year of Publication: 2015
Authors: Anuja Digambar Bharate
10.5120/ijca2015907109

Anuja Digambar Bharate . Finding Aliases using Noun Detection Algorithm. International Journal of Computer Applications. 129, 17 ( November 2015), 10-15. DOI=10.5120/ijca2015907109

@article{ 10.5120/ijca2015907109,
author = { Anuja Digambar Bharate },
title = { Finding Aliases using Noun Detection Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 129 },
number = { 17 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 10-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume129/number17/23164-2015907109/ },
doi = { 10.5120/ijca2015907109 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:23:40.469961+05:30
%A Anuja Digambar Bharate
%T Finding Aliases using Noun Detection Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 129
%N 17
%P 10-15
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Increased amount of internet users leads to collision in the names on the web forums and in many scenarios, which in turn leads to maximum number of users who are using their aliases in the web. This creates a difficulty in detecting the proper user. So, systems are suggested to identify their aliases using the entity graphs. But most of them are experimenting on the datasets; on the other hand very few systems exist to be worked on real entity. So implemented system put forward an idea of finding aliases on the real web data by using an enhanced web crawler which collects all sub URL’s of the given seed URL, which is analyzed by the another baby crawler to fetch and parse the web data as human readable content using random walk relational theory. Alias graph is identified to be more efficient with the help of real relation entity graph on the collected web data.

References
  1. Pavalam S. M., S. V. Kashmir Raja , Felix K. Akorli and Jawahar M., “A Survey of Web Crawler Algorithms”. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 1, November 2011 ISSN (Online): 1694-0814.
  2. C. A. Pina-Garcia, Dongbing Gu, “Scraping Global Threats in Facebook Through Movement Patterns Generated by Random Walks”. 4th Computer Science and Electronic Engineering Conference (CEEC), 2012.
  3. Snehal S. Shinde, P. R. Devale “Automated Entity Alias Evocation from Web”. International Journal of Recent Technology and Engineering (IJRTE)ISSN: 2277-3878, Volume-1, Issue-5, November 2012.
  4. Kaushik Chakrabarti, Surajit Chaudhuri, Tao Cheng, Dong Xin “A Framework for Robust Discovery of Entity Synonyms”. ACM 978-1-4503-1462-6 /12/08,2012.
  5. M.V.Prabath Kumar, “FOCUS: Learning to Crawl Internet Forums”. International Journal of Emerging Engineering Research and Technology Volume 2, Issue 3, PP 239-245, June 2014.
  6. DeXiang Zhang, DiFan Zhang and Xun Liu, “A Novel Malicious Web Crawler Detector: Performance and Evaluation”. IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 1, No 3, January 2013.
  7. Trupti V. Udapure, Ravindra D. Kale, Rajesh C. Dharmik “Study of Web Crawler and its Different Types”. IOSR Journal of Computer Engineering (IOSR-JCE), ISSN: 2278-8727Volume 16, Issue 1, Ver. VI ,PP 01-05, Feb. 2014.
  8. Ricardo Baeza-Yates, Ricardo Baeza-Yates “Crawling a Country: Better Strategies than BreadthFirst for Web Page Ordering”. 14th international conference on WWW, Pages 864-872, ACM,2005.
  9. Michal Konkol and Miloslav Konopík , “Named Entity Recognition for Highly Inflectional Languages: Effects of Various Lemmatization and Stemming Approaches”. LNAI 8655, pp. 267–274, 2014.
  10. K.K. Agbele, A.O. Adesina, N.A. Azeez , A.P. Abidoye “Context-Aware Stemming Algorithm for Semantically Related Root Words”.Afr J Comp & ICT, ISSN 2006-1781,2012.
  11. J. B. Lovins, “Development of a Stemming Algorithm”. Mechanical Translation and Computational Linguistics, vol.11, no. 12, pp: 22-31,1968.
  12. J. Dawson, “Suffix removal and word conflation”.ALLC Bulletin, vol. 2, no. 3, pp: 33-46,1974.
  13. M. Porter, “An Algorithm for Suffix Stripping. Program”. vol. 14, no. 3, pp: 130 – 137, 1980.
  14. Wahiba Ben Abdessalem Karaa, “A New Stemmer To Improve Information Retrieval”. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013.
  15. D. Paice Chris. (1990). Another Stemmer. ACM SIGIR Forum, Volume 24, No. 3, pp: 56-61.
  16. R. Krovetz.. “Viewing morphology as an inference process”. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA – June 27 th –July 01, 1993, pp: 191-202.
  17. M. Melucci and N. Orio. (2003), “A novel method for stemmer generation based on hidden Markov models”. Proceedings of the 12th international conference on Information and knowledge management, New Orleans, LA, USA, ACM Nov 2003 – 08, pp:131-138.
  18. M. Prasenjit, M. Mandar, K. Swapan K. Parui, K. Gobinda, M. Pabitra and D. Kalyankumar,“YASS: Yet another suffix stripper”.ACM Transactions on Information Systems. vol. 25, no. 4, article 18, 2007
  19. J. Xu, W.B. Croft, “Corpus-based stemming using co- occurrence of word variants”, ACM Transactions on Information Systems,1998, vol. 16, no. 1, pp: 61-81.
  20. P. Funchun, A. Nawaaz, L. Xin and L. Yumao, “Context sensitive stemming for web search”. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval Amsterdam, July 23 – 27, 2007, pp: 639-646.
  21. Lili Jiang, Ping Luo, Jianyong Wang, Yuhong Xiong, Bingduan Lin, Min Wang, Ning An,“GRIAS: an Entity-Relation Graph based Framework for Discovering Entity Aliases”.2013 IEEE 13th International Conference on Data Mining.
Index Terms

Computer Science
Information Sciences

Keywords

Web crawler NLP Entity relation Random walk Cauchy distribution.