Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Survey of Information Retrieval Techniques for Web using NLP

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Rini John, Sharvari S. Govilkar

Rini John and Sharvari S Govilkar. Article: Survey of Information Retrieval Techniques for Web using NLP. International Journal of Computer Applications 135(8):23-27, February 2016. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

	author = {Rini John and Sharvari S. Govilkar},
	title = {Article: Survey of Information Retrieval Techniques for Web using NLP},
	journal = {International Journal of Computer Applications},
	year = {2016},
	volume = {135},
	number = {8},
	pages = {23-27},
	month = {February},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}


Web is loaded with information and is getting overloaded with data each passing day. There needs to be efficient development in the area of information retrieval so the required data can be fetched accurately and efficiently. In this paper the two promising areas Natural Language Processing and Web technologies which can be combined together to enable the enterprise to combine the unstructured and structured data in ways that was not handled efficiently by traditional tools. The better understanding of web information can be done by integrating NLP and Web portal. The paper also explores various techniques used both areas. Also the NLP frameworks which can be used for the future work in this area.


  1. D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, “VIPS: A vision-based page segmentation algorithm”, Microsoft Tech. Rep., MSR-TR-2003-79, 2003.
  2. J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y. Ma, “Simultaneous record detection and attribute labeling in web data extraction”, in Proc. Int. Conf. Knowl. Disc. Data Mining, 2006, pp. 494–503.
  3. S. Sarawagi and W. W. Cohen, “Semi-Markov conditional random fields for information extraction”, in Proc. Conf. Neural Inf. Process. Syst., 2004, pp. 1185–1192.
  4. Fedor Bakalov, Bahar Sateli, Ren´e Witte, Marie-Jean Meurs, Birgitta K, “Natural Language Processing for Semantic Assistance in Web Portals”, 2012
  5. R. Witte and T. Gitzinger, “Semantic Assistants – User-Centric Natural Language Processing Services for Desktop Clients,” in 3rd Asian Semantic Web Conference (ASWC 2008), ser. LNCS, vol. 5367. Bangkok, Thailand: Springer, 2008, pp. 360–374. [Online].
  9. Paolo Nesi,Gianni Pantaleo and Marco Tenti, “Ge(o)Lo(cator):Geographic Information Extraction from Unstructured Text Data and Web Documents”, in 2014 9th International Workshop on Semantic and Social Media Adaption and Personalization.
  10. Suma Adindla and Udo Kruschwitz, “Combining the Best of Two Worlds: NLP and IR for Intranet Search”, in 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.
  11. Zhong Liu and Ying Wang, “A Novel method of Chinese Web Information Extraction and Applications”, in 2009 WASE International Conference on Information Engineering.
  12. Ruiqiang Guo and Fuji Ren, “Towards the Relationship Between Semantic Web and NLP”,2009
  13. B.Aysha Banu, Dr.M.Chitra , “A Novel Ensemble Vision Based Deep Web Data Extraction” in 2012 IEEE Intemational Conference on Advanced Communication Control and Computing Technologies (ICACCCT)
  14. I.Vijayalakshmi, Sobha Lalitha Devi, “Automatic Information Extraction through Mobile” ,in ICCCNT'12 26th_28th July 2012, Coimbatore, India


NLP, information Retrieval, semantic Assistance, entity extraction, visual page segmentation, semi-markov conditional random fields, hierarchical conditional random field and Web Portal.