Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results
![]() |
10.5120/ijca2017913526 |
Kavita Garg, Jayshankar Prasad and Saba Hilal. Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results. International Journal of Computer Applications 163(5):20-23, April 2017. BibTeX
@article{10.5120/ijca2017913526, author = {Kavita Garg and Jayshankar Prasad and Saba Hilal}, title = {Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results}, journal = {International Journal of Computer Applications}, issue_date = {April 2017}, volume = {163}, number = {5}, month = {Apr}, year = {2017}, issn = {0975-8887}, pages = {20-23}, numpages = {4}, url = {http://www.ijcaonline.org/archives/volume163/number5/27392-2017913526}, doi = {10.5120/ijca2017913526}, publisher = {Foundation of Computer Science (FCS), NY, USA}, address = {New York, USA} }
Abstract
The study of identification of near duplicate content involves identifying search categories which generate same URL in a query result. These categories are needed to be identified so that results can be improved by removing duplicate URL. Generating same URL in results irritates the user and it also decreases priority of other URL. These URL displayed on second or third page which user do not bother to open. Near duplicate content sometimes hides better results from the user and make the search results ineffective. There are many algorithms and procedures or filters to reduce the duplicity. But to reduce duplicity there is need to identify that duplicates. Which categories generate most duplicate results, in what form redundancy exists, which search engine generates these duplicate results and so on. This paper shows efforts to identify categories with maximum duplicates in term of same URL.
References
- H.Yang, J.Callan, S.Shulman (2006), “Next Steps in Near-Duplicate Detection for eRulemaking”, Proceedings of the international conference on Digital government research, pages 239-248.
- S.Weissman, S.Ayhan, J.Bradley, and J.Lin (2015), “Identifying Duplicate and Contradictory Information in Wikipedia”, Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 57-60.
- R.V R,et al,(2016) ,“Speeding up of Search Engine by Detection and Control of Duplicate Documents on the Web”, International Journal of Computer Science and Information Technologies,Vol.7 (2) , 637-642.
- M.Egele, S.Barbara, E.Kirda(2011) ,“Removing Web Spam Links from Search Engine Results” Journal in Computer Virology, Vol.7( 1), doi>10.1007/s11416-009-0132-6
Keywords
Keywords are your own designated keywords which can be used for easy location of the manuscript using any search engines.