Fake Review Detection using Principal Component Analysis and Active Learning

Faisal Muhammad Shah; Sifat Ahmed

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Analysis of Approaches to Short Term Passenger Volume Prediction in Public Transport

December

2015

Encryption of Compressed MultiMedia Data

December

2012

AM FM based Prediction of Multiple Sclerosis in Brain MRI Images

September

2014

Fuzzy Quality Control with Reliability and Flexibility

August

2013

Reseach Article

Fake Review Detection using Principal Component Analysis and Active Learning

by Faisal Muhammad Shah, Sifat Ahmed

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 178 - Number 48

Year of Publication: 2019

Authors: Faisal Muhammad Shah, Sifat Ahmed

10.5120/ijca2019919418

Faisal Muhammad Shah, Sifat Ahmed . Fake Review Detection using Principal Component Analysis and Active Learning. International Journal of Computer Applications. 178, 48 ( Sep 2019), 42-48. DOI=10.5120/ijca2019919418

@article{ 10.5120/ijca2019919418,

author = { Faisal Muhammad Shah, Sifat Ahmed },

title = { Fake Review Detection using Principal Component Analysis and Active Learning },

journal = { International Journal of Computer Applications },

issue_date = { Sep 2019 },

volume = { 178 },

number = { 48 },

month = { Sep },

year = { 2019 },

issn = { 0975-8887 },

pages = { 42-48 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume178/number48/30878-2019919418/ },

doi = { 10.5120/ijca2019919418 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:53:29.944104+05:30

%A Faisal Muhammad Shah

%A Sifat Ahmed

%T Fake Review Detection using Principal Component Analysis and Active Learning

%J International Journal of Computer Applications

%@ 0975-8887

%V 178

%N 48

%P 42-48

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

E-commerce proved its importance based on the fact where time is the essence. People are relying on e-commerce more than before. With e-commerce comes a huge amount of user feedback based on the products they buy. As the internet has become cheaper and easy to get, more people are getting connected through different social media and platform where they are expressing product-related feedbacks. With the rise of e-commerce, people are relying more on product reviews to get a clear view and user experience. But there is no convincing way to authenticate the reviews posted on products on e-commerce websites. To generate more revenue and fulfill some immoral benefits, some sellers are making investments and hiring people to post fake reviews. These fake reviews are generated to convince people to buy the product. To detect these fake reviews, several methodologies were introduced. Most of the models are supervised models which rely on pseudo fake reviews or large scale labeled dataset. In this paper, a model has been proposed with a new technique which combines two different types of learning methods (active and supervised) by creating a manually labeled dataset. This model has 4 different filtering phases that are based on TF-IDF, Countvectorizer and n-gram features of the review content and then Principal Component Analysis to reduce the feature set. It achieves a very encouraging result while working on 2000 reviews from Amazon. In the best case precision, recall, and f-score are slightly above 91% and the accuracy achieved is up to 90%. After comparing the results with similar successful methods where PCA is used as a feature selection technique, it is quite clear that the proposed model is efficient and encouraging.

References

Li, H. et al., 2014. Spotting Fake reviews via Collective Positive Unlabeled Learning. 2014 IEEE International Conference on Data Mining, 18(3), pp.899-904. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7023420.
Myle Ott, Claire Cardie, and Je_ Hancock. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pages 201-210. ACM, 2012.
M.N. Istiaq Ahsan, Abdullah All Kafi and Tamzid Nahian. Faisal Muhammad Shah, “An Ensemble approach to detect Review Spam using hybrid Machine Learning Technique.” 2016 19th International Conference on Computer and Information Technology (ICCIT).
Kyungyup Daniel Lee, Kyungah Han and Sung-Hyon Myaeng. Capturing Word Choice Patterns with LDA for Fake Review Detection in Sentiment Analysis. WIMS 2016.Availableat:https://dl.acm.org/citation.cfm?id=2912868
Bryan Hooi, Neil Shah, Alex Beutel, Stephan Gunnemann, Leman Akoglu, Mohit Kumar, Disha Makhija and Christos Faloutsos. “BIRDNEST: Bayesian Inference for Ratings-Fraud Detection”. arXiv:1511.06030v2 [cs.AI] 2016.
Chengai Sun et al. Chengai Sun, Qiaolin Du and Gang Tian. “Exploiting Product Related Review Features for Fake Review Detection”. Available at: http://dx.doi.org/10.1155/2016/4935792
Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh, Noel Crespi “NetSpam a Network-based Spam Detection Framework for Reviews in Online Social Media”. arXiv: 1703.036009v1 [cs.SI] 10 Mar 2017.
G.Vinodhini, RM.Chandrasekaran. “Effect of Feature reduction in Sentiment analysis of online reviews”. ISSN:2278-1323; v2. IJARCET, 6 June, 2013.
R. He, J. McAuley. “Ups and downs: Modeling the visual evolution of fashion trends with one class collaborative filtering”. WWW, 2016.
J. McAuley, C. Targett, J. Shi, A. van den Hengel. “Image-based recommendations on styles and substitutes”. SIGIR, 2015.
Algur, S., Hiremath, E., Patil, A. and Shivashankar, S., "Spam Detection of Customer Reviews from Web Pages." In Proceedings of the 2nd International Conference on IT and Business Intelligence.2010.
Streitfeld, David. "Buy reviews on Yelp, get black mark." New York Times.Available: http://www. nytimes.com/2012/10/18/technology/yelp-tries-to-halt-deceptive-reviews. html. (2012)
Jindal, Nitin, and Bing Liu. "Opinion spam and analysis." In Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219-230. ACM, 2008.
Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing? Boston, In ICWSM.
Lim, Ee-Peng, et al. "Detecting product review spammers using rating behaviors." Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 2010.
[Mukherjee, Arjun, Bing Liu, and Natalie Glance. "Spotting fake reviewer groups in consumer reviews." Proceedings of the 21st international conference on World Wide Web. ACM, 2012.
Xie, Sihong, et al. "Review spam detection via temporal pattern discovery."Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2012.
Feng, S., Xing, L., Gogar, A., and Choi, Y. "Distributional Footprints of Deceptive Product Reviews". ICWSM. 2012
Heydari, Atefeh, Mohammadali Tavakoli, and Naomie Salim. "Detection of fake opinions using time series." Expert Systems with Applications 58 (2016): 83-92.
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 309–319). Association for Computational Linguistics
Ott M, Cardie C, Hancock JT (2013) Negative Deceptive Opinion Spam. In: HLT-NAACL., pp 497–501.
A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews,” UIC-CS-03-2013. Tech. Rep., 2013.
Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th international conference on World Wide Web (pp. 1189–1190). ACM, Lyon, France.
Pennebaker, J.W. et al., The Development and Psychometric Properties of LIWC2007 The University of Texas at Austin. , pp.1–22.
Heydari, A. et al., 2015. Detection of review spam: A survey. Expert Systems with Applications, 42(7), pp.3634–3642. Available at: http://dx.doi.org/10.1016/j.eswa.2014.12.029.
DeBarr, Dave, and Harry Wechsler. "Spam detection using clustering, random forests, and active learning." Sixth Conference on Email and Anti-Spam. Mountain View, California. 2009.
Chin, S.C., Street, W.N., Srinivasan, P. and Eichmann, D.,, April. "Detecting Wikipedia vandalism with active learning and statistical language models." In Proceedings of the 4th workshop on Information credibility (pp. 3-10). ACM.2010.
Li, H. et al., 2014. Spotting Fake Reviews via Collective Positive-Unlabeled Learning. 2014 IEEE International Conference on Data Mining, 18(3), pp.899–904.Availableat:http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7023420.
Myle Ott, Claire Cardie, and Je_ Hancock. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pages 201-210. ACM, 2012.
Settles, Burr. "Active learning literature survey." University of Wisconsin, Madison 52.55-66 (2010): 11.
Rubens, Neil, Dain Kaplan, and Masashi Sugiyama. "Active learning in recommender systems." Recommender systems handbook. Springer US,201 1. 735-767.
B. Bigi, “Using Kullback-Leibler distance for text categorization,” Proceeding ECIR’03 Proc. 25th Eur. Conf. IR Res., pp. 305–319, 2003.
Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1566–1576, Baltimore, Maryland, USA, June 23-25 2014. ACL.

Index Terms

Computer Science

Information Sciences

Keywords

Review spam detection Fake review PCA Active Learning Machine Learning.