CFP last date
20 May 2024
Reseach Article

Fake Review Detection using Principal Component Analysis and Active Learning

by Faisal Muhammad Shah, Sifat Ahmed
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 48
Year of Publication: 2019
Authors: Faisal Muhammad Shah, Sifat Ahmed
10.5120/ijca2019919418

Faisal Muhammad Shah, Sifat Ahmed . Fake Review Detection using Principal Component Analysis and Active Learning. International Journal of Computer Applications. 178, 48 ( Sep 2019), 42-48. DOI=10.5120/ijca2019919418

@article{ 10.5120/ijca2019919418,
author = { Faisal Muhammad Shah, Sifat Ahmed },
title = { Fake Review Detection using Principal Component Analysis and Active Learning },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2019 },
volume = { 178 },
number = { 48 },
month = { Sep },
year = { 2019 },
issn = { 0975-8887 },
pages = { 42-48 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number48/30878-2019919418/ },
doi = { 10.5120/ijca2019919418 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:53:29.944104+05:30
%A Faisal Muhammad Shah
%A Sifat Ahmed
%T Fake Review Detection using Principal Component Analysis and Active Learning
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 48
%P 42-48
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

E-commerce proved its importance based on the fact where time is the essence. People are relying on e-commerce more than before. With e-commerce comes a huge amount of user feedback based on the products they buy. As the internet has become cheaper and easy to get, more people are getting connected through different social media and platform where they are expressing product-related feedbacks. With the rise of e-commerce, people are relying more on product reviews to get a clear view and user experience. But there is no convincing way to authenticate the reviews posted on products on e-commerce websites. To generate more revenue and fulfill some immoral benefits, some sellers are making investments and hiring people to post fake reviews. These fake reviews are generated to convince people to buy the product. To detect these fake reviews, several methodologies were introduced. Most of the models are supervised models which rely on pseudo fake reviews or large scale labeled dataset. In this paper, a model has been proposed with a new technique which combines two different types of learning methods (active and supervised) by creating a manually labeled dataset. This model has 4 different filtering phases that are based on TF-IDF, Countvectorizer and n-gram features of the review content and then Principal Component Analysis to reduce the feature set. It achieves a very encouraging result while working on 2000 reviews from Amazon. In the best case precision, recall, and f-score are slightly above 91% and the accuracy achieved is up to 90%. After comparing the results with similar successful methods where PCA is used as a feature selection technique, it is quite clear that the proposed model is efficient and encouraging.

References
  1. Li, H. et al., 2014. Spotting Fake reviews via Collective Positive Unlabeled Learning. 2014 IEEE International Conference on Data Mining, 18(3), pp.899-904. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7023420.
  2. Myle Ott, Claire Cardie, and Je_ Hancock. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pages 201-210. ACM, 2012.
  3. M.N. Istiaq Ahsan, Abdullah All Kafi and Tamzid Nahian. Faisal Muhammad Shah, “An Ensemble approach to detect Review Spam using hybrid Machine Learning Technique.” 2016 19th International Conference on Computer and Information Technology (ICCIT).
  4. Kyungyup Daniel Lee, Kyungah Han and Sung-Hyon Myaeng. Capturing Word Choice Patterns with LDA for Fake Review Detection in Sentiment Analysis. WIMS 2016.Availableat:https://dl.acm.org/citation.cfm?id=2912868
  5. Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunnemann, Leman Akoglu, Mohit Kumar, Disha Makhija and Christos Faloutsos. “BIRDNEST: Bayesian Inference for Ratings-Fraud Detection”. arXiv:1511.06030v2 [cs.AI] 2016.
  6. Chengai Sun et al. Chengai Sun, Qiaolin Du and Gang Tian. “Exploiting Product Related Review Features for Fake Review Detection”. Available at: http://dx.doi.org/10.1155/2016/4935792
  7. Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi “NetSpam a Network-based Spam Detection Framework for Reviews in Online Social Media”. arXiv: 1703.036009v1 [cs.SI] 10 Mar 2017.
  8. G.Vinodhini, RM.Chandrasekaran. “Effect of Feature reduction in Sentiment analysis of online reviews”. ISSN:2278-1323; v2. IJARCET, 6 June, 2013.
  9. R. He, J. McAuley. “Ups and downs: Modeling the visual evolution of fashion trends with one class collaborative filtering”. WWW, 2016.
  10. J. McAuley, C. Targett, J. Shi, A. van den Hengel. “Image-based recommendations on styles and substitutes”. SIGIR, 2015.
  11. Algur, S., Hiremath, E., Patil, A. and Shivashankar, S., "Spam Detection of Customer Reviews from Web Pages." In Proceedings of the 2nd International Conference on IT and Business Intelligence.2010.
  12. Streitfeld, David. "Buy reviews on Yelp, get black mark." New York Times.Available: http://www. nytimes.com/2012/10/18/technology/yelp-tries-to-halt-deceptive-reviews. html. (2012)
  13. Jindal, Nitin, and Bing Liu. "Opinion spam and analysis." In Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219-230. ACM, 2008.
  14. Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing? Boston, In ICWSM.
  15. Lim, Ee-Peng, et al. "Detecting product review spammers using rating behaviors." Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 2010.
  16. [Mukherjee, Arjun, Bing Liu, and Natalie Glance. "Spotting fake reviewer groups in consumer reviews." Proceedings of the 21st international conference on World Wide Web. ACM, 2012.
  17. Xie, Sihong, et al. "Review spam detection via temporal pattern discovery."Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.
  18. Feng, S., Xing, L., Gogar, A., and Choi, Y. "Distributional Footprints of Deceptive Product Reviews". ICWSM. 2012
  19. Heydari, Atefeh, Mohammadali Tavakoli, and Naomie Salim. "Detection of fake opinions using time series." Expert Systems with Applications 58 (2016): 83-92.
  20. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 309–319). Association for Computational Linguistics
  21. Ott M, Cardie C, Hancock JT (2013) Negative Deceptive Opinion Spam. In: HLT-NAACL., pp 497–501.
  22. A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews,” UIC-CS-03-2013. Tech. Rep., 2013.
  23. Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th international conference on World Wide Web (pp. 1189–1190). ACM, Lyon, France.
  24. Pennebaker, J.W. et al., The Development and Psychometric Properties of LIWC2007 The University of Texas at Austin. , pp.1–22.
  25. Heydari, A. et al., 2015. Detection of review spam: A survey. Expert Systems with Applications, 42(7), pp.3634–3642. Available at: http://dx.doi.org/10.1016/j.eswa.2014.12.029.
  26. DeBarr, Dave, and Harry Wechsler. "Spam detection using clustering, random forests, and active learning." Sixth Conference on Email and Anti-Spam. Mountain View, California. 2009.
  27. Chin, S.C., Street, W.N., Srinivasan, P. and Eichmann, D.,, April. "Detecting Wikipedia vandalism with active learning and statistical language models." In Proceedings of the 4th workshop on Information credibility (pp. 3-10). ACM.2010.
  28. Li, H. et al., 2014. Spotting Fake Reviews via Collective Positive-Unlabeled Learning. 2014 IEEE International Conference on Data Mining, 18(3), pp.899–904.Availableat:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7023420.
  29. Myle Ott, Claire Cardie, and Je_ Hancock. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pages 201-210. ACM, 2012.
  30. Settles, Burr. "Active learning literature survey." University of Wisconsin, Madison 52.55-66 (2010): 11.
  31. Rubens, Neil, Dain Kaplan, and Masashi Sugiyama. "Active learning in recommender systems." Recommender systems handbook. Springer US,201 1. 735-767.
  32. B. Bigi, “Using Kullback-Leibler distance for text categorization,” Proceeding ECIR’03 Proc. 25th Eur. Conf. IR Res., pp. 305–319, 2003.
  33. Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1566–1576, Baltimore, Maryland, USA, June 23-25 2014. ACL.
Index Terms

Computer Science
Information Sciences

Keywords

Review spam detection Fake review PCA Active Learning Machine Learning.