Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Review on Text Sanitization

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 95 - Number 25
Year of Publication: 2014
Veena Vasudevan
Ansamma John

Veena Vasudevan and Ansamma John. Article: A Review on Text Sanitization. International Journal of Computer Applications 95(25):14-17, June 2014. Full text available. BibTeX

	author = {Veena Vasudevan and Ansamma John},
	title = {Article: A Review on Text Sanitization},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {95},
	number = {25},
	pages = {14-17},
	month = {June},
	note = {Full text available}


Information is essential for all purpose of activities such as research, business decision making, etc. In this internet technology age there is no scarcity of information also. But if the information reveals the identity of a person or if it discloses confidential matters, then such information is a serious threat to privacy. So before publishing or sharing documents, the sensitive information should be removed or masked. This is the major goal of Text sanitization. Several semi-automatic and automatic methods are used for identifying sensitive information and thereby sanitizing the document by removing such terms. This broadens the users using the document due to their lowered classification level and also privacy is preserved.


  • D. Chen and H. Zhao, "Data security and privacy protection issues in cloud computing," in Proc. 2012 Int. Conf. Computer Science and Electronics Engineering, 2012, pp. 647–651.
  • S. Pignal, "EU eyes big fines for privacy breaches," FinancialTimes2011[Online]. Available:http://www. ft. com/intl/cms/s/2/bf962998-1d01-11e1-a26a-00144feabdc0. html#axzz1fe8ewpqO
  • Department of Health and Human services,Office of the Secretary, TheHealth Insurance Portability and Accountability Act of 1996,Tech Rep. Federal Register 65 FR 82462, 2000.
  • L. Sweeney, "Replacing personally-identifying information in medical records, the scrub system," in Proc. 1996 American Medical Informatics Association Ann. Symp. , 1996, pp. 333–337.
  • M. M. Douglass, G. D. Cliffford, A. Reisner, W. J. Long, G. B. Moody, and R. G. Mark, "De-identification algorithm for free-text nursing notes," Proc. Computers in Cardiology'05, pp. 331–334, 2005.
  • Tveit, A. , Edsberg, O. , Rost, T. B. , Faxvaag, A. , Nytro, O. , Nordgard, M. T. , Ranang, M. T. , Grimsmo, A. : Anonymization of general practioner medical records. In: Proceedings of the Second HelsIT Conference (2004)
  • V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania, "Efficient techniques for document sanitization," in Proc. ACM Conf. Information and Knowledge Management'08, 2008, pp. 843–852
  • C. Cumby and R. Ghan, "A machine learning based system for semiautomatically redacting documents," in Proc. 23rd Innovative Applications of Artificial Intelligence Conf. , 2011, pp. 1628–1635.
  • D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial Intelligence'11, 2011, pp. 235–246.
  • Sánchez, D. , Batet, M. , and Viejo, A, "Automatic General-Purpose Sanitization of Textual Documents", IEEE Transactions on Information Forensics and Security, VOL. 8, NO. 6, JUNE 2013,pp. 853-862
  • Sánchez, D. , Batet, M. , and Viejo, A," Minimizing the disclosure risk of semantic correlations in document sanitization", Information Sciences,2013,pp. 110-123
  • C. Fellbaum, WordNet: An Electronic Lexical Database. Cambridge,MA, USA: MIT Press, 1998
  • Church, K. W. , and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16 (1), 22-29.
  • B. Anandan, C. Clifton, W. Jiang, M. Murugesan, P. Pastrana-Camacho, and L. Si, "t-plausibility: Generalizing words to desensitize text," Trans. Data Privacy, vol. 5, pp. 505–534, 2012