|International Journal of Computer Applications
|Foundation of Computer Science (FCS), NY, USA
|Volume 32 - Number 6
|Year of Publication: 2011
|Authors: Ganesh D. Puri, Prof. Y.C. Kulkarni
Ganesh D. Puri, Prof. Y.C. Kulkarni . Article:Realization of Framework for Web Content Extraction and Classification. International Journal of Computer Applications. 32, 6 ( October 2011), 22-26. DOI=10.5120/3908-5486
Web content extraction and classification can be viewed as combination of different methods. Nowadays web page contains lot of information including main contents. Contents extraction which are of user’s interest is main task. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. It is therefore crucial that a good text mining model should retrieve the information that meets user’s needs within a relatively efficient time frame. A first step toward any Web-based text mining effort would be to collect a significant number of Web mentions of a subject. Thus, the challenge becomes not only to find all the subject occurrences, but also to filter out just those that have the desired meaning. The system described in this paper is capable of extracting main content and classify it. Vector space model method is used for classification.