Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Generic CBTS: Correlation based Transformation Strategy for Privacy Preserving Data Mining

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
N. P. Nethravathi, Prasanth G. Rao, Chaitra C. Vaidya, P. Deepa Shenoy, Venugopal K. R., Indiramma M.
10.5120/ijca2017912353

N P Nethravathi, Prasanth G Rao, Chaitra C Vaidya, Deepa P Shenoy, Venugopal K R. and Indiramma M.. Generic CBTS: Correlation based Transformation Strategy for Privacy Preserving Data Mining. International Journal of Computer Applications 157(1):1-7, January 2017. BibTeX

@article{10.5120/ijca2017912353,
	author = {N. P. Nethravathi and Prasanth G. Rao and Chaitra C. Vaidya and P. Deepa Shenoy and Venugopal K. R. and Indiramma M.},
	title = {Generic CBTS: Correlation based Transformation Strategy for Privacy Preserving Data Mining},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2017},
	volume = {157},
	number = {1},
	month = {Jan},
	year = {2017},
	issn = {0975-8887},
	pages = {1-7},
	numpages = {7},
	url = {http://www.ijcaonline.org/archives/volume157/number1/26792-2016912353},
	doi = {10.5120/ijca2017912353},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Mining useful knowledge from corpus of data has become an important application in many fields. Data Mining algorithms like Clustering, Classification work on this data and provide crisp information for analysis. As these data are available through various channels into public domain, privacy for the owners of the data is increasing need. Though privacy can be provided by hiding sensitive data, it will affect the Data Mining algorithms in knowledge extraction, so an effective mechanism is required to provide privacy to the data and at the same time without affecting the Data Mining results. Privacy concern is a primary hindrance for quality data analysis. Data mining algorithms on the contrary focus on the mathematical nature than on the private nature of the information. Therefore instead of removing or encrypting sensitive data, we propose transformation strategies that retain the statistical, semantic and heuristic nature of the data while masking the sensitive information. The proposed Correlation Based Transformation Strategy (CBTS) combines Correlation Analysis in tandem with data transformation techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Non Negative Matrix Factorization (NNMF) provides the intended level of privacy preservation and enables data analysis. The proposed technique will work for numerical, ordinal and nominal data. The outcome of CBTS is evaluated on standard datasets against popular data mining techniques with significant success and Information Entropy is also accounted.

References

  1. Vassilios S. Veryhios, Elisa Bertino, Igor Nai Fovino Loredana Parasiliti Provenza, Yucel Saygin, Yannis eodoridis, ”Stateof- the-art in Privacy Preserving Data Mining”, SIGMOD Record, Vol. 33, No.1, March 2004.
  2. Vijayarani S. and A. Tamilarasi. ”An efficient masking technique for sensitive data protection.” Recent Trends in Information Technology (ICRTIT), 2011 International Conference on. IEEE, 2011.
  3. R. K. Boora, R. Shukla, and A. K. Misra,”An Improved Approach to High Level Privacy Preserving Itemset Mining”, USA, no. arXiv:1001.2270. VOLUME 6. NO.3. pp. 216-223, ISSN 1947-5500, Jan 2010. [Online]. Available: http://cds.cern.ch/record/1233468
  4. T. Zhu, P. Xiong, G. Li, and W. Zhou,”Correlated differential privacy: Hiding information in non-iid data set”, IEEE Transactions on Information Forensics and Security, vol.10, no. 2, pp. 229-242, Feb 2015.
  5. B. Samanthula, Y. Elmehdwi, and W. Jiang, ”k-nearest neighbor classification over semantically secure encrypted relational data”, IEEE Transactions on Knowledge and Data Engineering, vol. 27, no.5, pp.1261-1273, May 2015.
  6. X. Liu, R. Lu, J. Ma, L. Chen, and B. Qin,”Privacypreserving patient-centric clinical decision support system on naive bayesian classification”, IEEE Journal of Biomedical and Health Informatics, pp.1-1, 2015.
  7. Z. Zhang, K. McDonnell, E. Zadok, and K. Mueller, ”Visual correlation analysis of numerical and categorical data on the correlation map”, IEEE Transactions on Visualization and Computer Graphics, vol.21, no.2, pp. 289-303, Feb 2015.
  8. Y. Sang, H. Shen, and H. Tian, ”Effective reconstruction of data perturbed by random projections”, IEEE Transactions on Computers, vol.61, no.2, pp.101-117, Jan 2012.
  9. Fong, P.K. and Weber-Jahnke, J.H., ” Privacy preserving decision tree learning using unrealized data sets”, IEEE Transactions on knowledge and Data Engineering 2012, 24(2), pp.353- 364.
  10. Alotaibi, Khaled, and Beatriz De La Iglesia. ”Privacypreserving SVM classification using non-metric MDS.” (2013): pp. 30-35.
  11. Dowon Hong and Abedelaziz Mohaisen ”Augmented Rotation-Based Transformation for Privacy-Preserving Data Clustering” ETRI Journal, Volume 32, Number 3, June 2010.
  12. AA Hosain ”Shear-based Spatial Transformation to Protect Proximity Attack in Outsourced Databae” IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2013.
  13. Chongjing Sun, Yan Fu, Junlin Zhou, and Hui Gao”Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response” , The Scientific World Journal March 2014.
  14. Upmanyu, Maneesh, Anoop M. Namboodiri, Kannan Srinathan, and C. V. Jawahar.”Efficient privacy preserving kmeans clustering”. In Pacific-Asia Workshop on Intelligence and Security Informatics, pp. 154-166. Springer Berlin Heidelberg, 2010.
  15. Zekeriya Erkin : ”Privacy-preserving distributed clustering”. EURASIP Journal on Information Security pp.1-5, 2013(1),.
  16. Likun Liu ”Using Noise Addition Method Based on Premining to Protect Healthcare Privacy CEAI”, Vol.14, No.2, pp.58-64, 2012.
  17. Guo, Ling. ”Randomization Based Privacy Preserving Categorical Data Analysis” Diss. The University of North Carolina at Charlotte, 2010.
  18. S. Patel and K. R. Amin, ”Privacy Preserving Based on PCA Transfor-mation using data perturbation technique”, International Journal of Computer Science Engineering Technology, vol.4, no.35, pp.477-484, 2013.
  19. S. Xu, J. Zhang, D. Han, and J.Wang,”Singular value decomposition based data distortion strategy for privacy protection”, Knowledge and Information Systems, vol. 10, no. 3, pp. 383- 397, 2006.
  20. J. Wang, W. Zhong, J. Zhang, and S. Xu, ”Selective data distortion via structural partition and ssvd for privacy preservation”, in IKE. Citeseer, pp.114-120, 2006.

Keywords

Transformation Strategy, Privacy Preserving Data Mining, Correlation Analysis, Information Entropy