CFP last date
21 October 2024
Reseach Article

Personalization and Clustering of Similar Web Pages

by Smita Gupta, Anurag Malik
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 54 - Number 14
Year of Publication: 2012
Authors: Smita Gupta, Anurag Malik
10.5120/8635-2556

Smita Gupta, Anurag Malik . Personalization and Clustering of Similar Web Pages. International Journal of Computer Applications. 54, 14 ( September 2012), 24-30. DOI=10.5120/8635-2556

@article{ 10.5120/8635-2556,
author = { Smita Gupta, Anurag Malik },
title = { Personalization and Clustering of Similar Web Pages },
journal = { International Journal of Computer Applications },
issue_date = { September 2012 },
volume = { 54 },
number = { 14 },
month = { September },
year = { 2012 },
issn = { 0975-8887 },
pages = { 24-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume54/number14/8635-2556/ },
doi = { 10.5120/8635-2556 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:55:41.169689+05:30
%A Smita Gupta
%A Anurag Malik
%T Personalization and Clustering of Similar Web Pages
%J International Journal of Computer Applications
%@ 0975-8887
%V 54
%N 14
%P 24-30
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Over the last decade, clichéd information age has justly arrived. Moreover, the evolution of the Internet into the Global Information Infrastructure, together with the massive popularity of the Web, has also enabled the ordinary citizen to become not just a consumer of information, but also a part of it. In order to make user trouble free, it is required to save his/her time and effort. So some way is needed to give the relevant information to the user in a quick way and also enables to manage the whole lot of data without troublesome. Through this paper, the authors have used tf-idf (term frequency inverse document frequency approach) technique along with the concept of web mining to attain the required solution. Web mining is the application of data mining techniques that aims in discovering the patterns from the Web. Among its different ways, like Web usage mining, Web content mining and Web structure mining, here, efforts are only being made in the field of web content mining. In this work, a windows application is developed which act as a data analysis tool. This application is using the API of Bing search engine. The proposed algorithm is applied on the snippets (short description provided below each search result) of web search results to find those web pages that contains maximum number of query words. Moreover, it also aims at managing the information more easily on client's machine by using simple grouping technique.

References
  1. Ajay Ohri, 2010, "Data mining through Cloud Computing". http://knol. google. com/k/data-mining-through-cloud-computing#.
  2. Andrei Broder , 2002, "A taxonomy of web search" , IBM Research , SIGIR Forum, Fall 2002, Vol. 36, No. 2
  3. Bamshad Mobasher, "Data Mining for Web Personalization", Center for Web Intelligence School of Computer Science, Telecommunication, and Information Systems DePaul University, Chicago, Illinois, USA
  4. Giles, L. and S. Lawrence, 1999, "Accessibility and distribution of information on the web. " Nature, 400.
  5. API Basics , http://www. bing. com/developers/s/APIBasics. html
  6. Personalization is not Technology: Using Web Personalization to promote your Business, http://www. boxesandarrows. com/view/personalization_is_not_technology_using_web_personalization_to_promote_your_business_goal. Accessed by Christian Ricci on 2004/01/12
  7. Scoring and Ranking Techniques - tf-idf term weighting and cosine similarity, http://www. ir-facility. org/scoring-and-ranking-techniques-tf-idf-term-weighting-and-cosine-similarity. , Published Mar 31, 2010 by Michael Dittenbach
  8. Information Retrieval and Data Mining, Part 1 – Information Retrieval, http://lsirwww. epfl. ch/courses/dis/2007ws/lecture/week%2010%20Vector%20Space%20Model. pdf. Accessed by Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval – 1, 2007-8
  9. Cosine Similarity and Term Weight Tutorial, An Information Retrieval Tutorial on Cosine Similarity Measures, Dot Products and Term Weight Calculations, http://www. miislita. com/information-retrieval-tutorial/cosine-similarity-tutorial. html, by Dr. E. Garcia 2006
  10. How does Google Pick Snippets for Your Pages to Show in Search Results?, http://www. seobythesea. com/2007/12/how-does-google-pick-snippets-for-your-pages-to-show-in-search-results/. Accessed by By Bill Slawski, on December 18, 2007
  11. Martin-Bautista, M. J. , Vila, M. , and Larsen, H. L. 1999 , "A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent," Journal of the American Society for Information Science (50:9), pp. 760-771
  12. M. Angelaccio, B. Buttarazzi, M. Patrignanelli, 2007, "Graph Use to Visualize Web Search Results: MyWish 3. 0", 11th International Conference Information Visualization (IV'07), © 2007IEEE
  13. Mulvenna, M. , Anand , S. S. , B¨uchner, 2000, " A. G. : Personalization on the net using web mining", Communication of ACM 43(8) 122–125
  14. Porter, M. F. , 1980, "An Algorithm for Suffix Stripping Program", 14 no. 3, pp. 130-137.
  15. Rainie, L. and J. Shermak. , 2005, "Search engine use shoots up in the past year and edges towards email as the primary internet application. " Technical report, Online Activities & Pursuits, Pew Internet & American Life Project.
  16. Raymond Kosala, Hendrik Blockeel, 2000,"Web Mining Research: A Survey", In ACM SIGKDD
  17. S. K. Card, J. Mackinlay, and B. Shneiderman. 1999, "Readings in Information Visualization: Using Vision to Think". Interactive Technologies Series. Morgan Kaufmann Publishers
  18. Shady Elbassuoni, (2007), "Adaptive Personalization of Web Search", JUNE 2007 (elbassmasters)
  19. Xiaohui Cui, Thomas E. Potok, Paul Palathingal , 2005, "Document Clustering using Particle Swarm Optimization", Applied Software Engineering Research Group Computational Sciences and Engineering Division Oak Ridge National Laboratory Oak Ridge, IEEE
  20. Y. Wang, M. Kitsuregawa, " Link-based Clustering of Web Search Results", In Proceedings of The Second International Conference on Web-Age Information Management.
Index Terms

Computer Science
Information Sciences

Keywords

Term frequency-inverse document frequency (tf-idf) static clustering Mining methods and algorithms Information Retrieval