Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Development of Cluster based Supervised Learning Technique for Web News Extraction

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Pardeep Kaur, Rekha Bhatia

Pardeep Kaur and Rekha Bhatia. Development of Cluster based Supervised Learning Technique for Web News Extraction. International Journal of Computer Applications 152(5):30-31, October 2016. BibTeX

	author = {Pardeep Kaur and Rekha Bhatia},
	title = {Development of Cluster based Supervised Learning Technique for Web News Extraction},
	journal = {International Journal of Computer Applications},
	issue_date = {October 2016},
	volume = {152},
	number = {5},
	month = {Oct},
	year = {2016},
	issn = {0975-8887},
	pages = {30-31},
	numpages = {2},
	url = {},
	doi = {10.5120/ijca2016911805},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


World Wide Web makes it a prominent source of online information as abundance of data is available on the web and lots of data gets uploaded on daily basis. Due to the presence of massive information on the web it seems easier and simpler to get any information at any time effortlessly, but it requires a lot of focus. Numerous web mining techniques have been studied like extractors, wrappers etc, that provide various methods to extract useful web content. In this paper a semi-supervised web news extraction technique is proposed that uses unsupervised clustering technique and supervised classification technique.


  1. Zhong Ji, Member, Yanwei Pang, Senior Member, and Xuelong Li, “Relevance Preserving Projection and Ranking for Web Image Search Reranking”, VOL. 24, NO. 11, NOVEMBER 2015.
  2. Debina Laishram and Merin Sebastian,“Extraction of web news from web pages using a ternary tree approach,” IEEE Second International Conference on Advances in Computing and Communication Engineering,, pp. 628-633, 2015.
  3. Shanchan Wu, Jerry Liu, Jian Fan, “Automatic Web Content Extraction by Combination of Learning and Grouping,” International World Wide Web Conference Committee (IW3C2), pp. 1264-1274, WWW 2015, May 18-22, 2015, Florence, Italy.
  4. Yan Guo et al, “ECON: An Approach to Extract Content from Web News Page,” IEEE 12th International Asia-Pacific Web Conference, 2010, pp. 314-320
  5. Yongquan Dong1,Qingzhon Li1,Zhongmin Yan1 and Yanhui Ding,” A Generic Web News Extraction Approach,” Proceedings of the 2008 IEEE, International Conference on Information and Automatio, Zhangjiajie, China,June 20-23,2008.
  6. M. Wook, Y. H. Yahaya, N. Wahab, M. R. M. Isa, N. F. Awang, and H. Y. Seong, (2009) “Predicting NDUM student‟s academic performance using data mining techniques,” in Proc. 2009 Second Int. Conf. Comput. Electr. Eng., pp. 357-361.
  7. Yung-Shen Lin et al, “A Similarity Measure for Text Classification and Clustering,” IEEE transactions on knowledge and data engineering, vol. 26, no. 7, pp. 1575-1590, July 2014.
  8. Matthew Michelson and Craig A. Knoblock, “Unsupervised Information Extraction from Unstructured,Ungrammatical Data Sources on the World Wide Web,” International Journal of Document Analysis and Recognition (IJDAR), August 2007.
  9. Davi de Castro Reis et al, WWW2004, New York, USA.ACM1­58113­844­X/04/0005. “Automatic Web News Extraction Using Tree Edit Distance,”May 17.22, 2004, pp. 502-511


Web Mining, Web News, Web News Extraction, Unsupervised Machine Learning, Classification