CFP last date
20 November 2025
Call for Paper
December Edition
IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2025

Submit your paper
Know more
Random Articles
Reseach Article

A Review on Text Sanitization

by Veena Vasudevan, Ansamma John
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 25
Year of Publication: 2014
Authors: Veena Vasudevan, Ansamma John
10.5120/16749-6916

Veena Vasudevan, Ansamma John . A Review on Text Sanitization. International Journal of Computer Applications. 95, 25 ( June 2014), 14-17. DOI=10.5120/16749-6916

@article{ 10.5120/16749-6916,
author = { Veena Vasudevan, Ansamma John },
title = { A Review on Text Sanitization },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 25 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 14-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume95/number25/16749-6916/ },
doi = { 10.5120/16749-6916 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:20:22.884921+05:30
%A Veena Vasudevan
%A Ansamma John
%T A Review on Text Sanitization
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 25
%P 14-17
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information is essential for all purpose of activities such as research, business decision making, etc. In this internet technology age there is no scarcity of information also. But if the information reveals the identity of a person or if it discloses confidential matters, then such information is a serious threat to privacy. So before publishing or sharing documents, the sensitive information should be removed or masked. This is the major goal of Text sanitization. Several semi-automatic and automatic methods are used for identifying sensitive information and thereby sanitizing the document by removing such terms. This broadens the users using the document due to their lowered classification level and also privacy is preserved.

References
  1. D. Chen and H. Zhao, "Data security and privacy protection issues in cloud computing," in Proc. 2012 Int. Conf. Computer Science and Electronics Engineering, 2012, pp. 647–651.
  2. S. Pignal, "EU eyes big fines for privacy breaches," FinancialTimes2011[Online]. Available:http://www. ft. com/intl/cms/s/2/bf962998-1d01-11e1-a26a-00144feabdc0. html#axzz1fe8ewpqO
  3. Department of Health and Human services,Office of the Secretary, TheHealth Insurance Portability and Accountability Act of 1996,Tech Rep. Federal Register 65 FR 82462, 2000.
  4. L. Sweeney, "Replacing personally-identifying information in medical records, the scrub system," in Proc. 1996 American Medical Informatics Association Ann. Symp. , 1996, pp. 333–337.
  5. M. M. Douglass, G. D. Cliffford, A. Reisner, W. J. Long, G. B. Moody, and R. G. Mark, "De-identification algorithm for free-text nursing notes," Proc. Computers in Cardiology'05, pp. 331–334, 2005.
  6. Tveit, A. , Edsberg, O. , Rost, T. B. , Faxvaag, A. , Nytro, O. , Nordgard, M. T. , Ranang, M. T. , Grimsmo, A. : Anonymization of general practioner medical records. In: Proceedings of the Second HelsIT Conference (2004)
  7. V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania, "Efficient techniques for document sanitization," in Proc. ACM Conf. Information and Knowledge Management'08, 2008, pp. 843–852
  8. C. Cumby and R. Ghan, "A machine learning based system for semiautomatically redacting documents," in Proc. 23rd Innovative Applications of Artificial Intelligence Conf. , 2011, pp. 1628–1635.
  9. D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial Intelligence'11, 2011, pp. 235–246.
  10. Sánchez, D. , Batet, M. , and Viejo, A, "Automatic General-Purpose Sanitization of Textual Documents", IEEE Transactions on Information Forensics and Security, VOL. 8, NO. 6, JUNE 2013,pp. 853-862
  11. Sánchez, D. , Batet, M. , and Viejo, A," Minimizing the disclosure risk of semantic correlations in document sanitization", Information Sciences,2013,pp. 110-123
  12. C. Fellbaum, WordNet: An Electronic Lexical Database. Cambridge,MA, USA: MIT Press, 1998
  13. Church, K. W. , and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16 (1), 22-29.
  14. B. Anandan, C. Clifton, W. Jiang, M. Murugesan, P. Pastrana-Camacho, and L. Si, "t-plausibility: Generalizing words to desensitize text," Trans. Data Privacy, vol. 5, pp. 505–534, 2012
Index Terms

Computer Science
Information Sciences

Keywords

Document Declassification Data Publishing Term correlation Privacy Information Theory Named Entity Recognition