Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 100 - Number 18
Year of Publication: 2014
Veena Vasudevan
Ansamma John

Veena Vasudevan and Ansamma John. Article: Automatic Declassification of Textual Documents by Generalizing Sensitive Terms. International Journal of Computer Applications 100(18):24-28, August 2014. Full text available. BibTeX

	author = {Veena Vasudevan and Ansamma John},
	title = {Article: Automatic Declassification of Textual Documents by Generalizing Sensitive Terms},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {100},
	number = {18},
	pages = {24-28},
	month = {August},
	note = {Full text available}


With the advent of internet, large numbers of text documents are published and shared every day . Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. Various schemes were developed to solve this problem but most of them turned out to be domain specific and most of them didn't consider the presence of semantically correlated terms. This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. The proposed method removes the confidential information from the text document by first finding the independent sensitive terms. Then with the use of these sensitive terms the correlated terms that cause a disclosure threat are discovered. Again with the help of a generalization algorithm these sensitive and correlated terms with high disclosure risk are generalized.


  • A. shamir,"How to share a secret", comun ACM,vol 22,no. 11,pp,612-613,1979
  • F. Baiardi, A. Falleni, R. Granchi, F. Martinelli, M. Petrocchi, and A. Vaccarelli, "Seas, a secure e-voting protocol: Design and implementation," Comput. Security, vol. 24, no. 8, pp. 642–652, Nov. 2005. .
  • A. Friedman, R. Wolff, and A. Schuster, "Providing k-anonymity in data mining," VLDB Journal, vol. 17, no. 4, pp. 789–804, Jul. 2008. .
  • Q. Xie and U. Hengartner, "Privacy-preserving matchmaking for mobile social networking secure against malicious users," in Proc. 9th Ann. IEEE Conf. Privacy, Security and Trust, Jul. 2011, pp. 252–259.
  • D. Chaum, "Untraceable electronic mail, return address and digital pseudonyms," Commun. ACM, vol. 24, no. 2, pp. 84–88, Feb. 1981.
  • Sánchez, D. , Batet, M. , and Viejo, A. "Detecting sensitive information from textual documents: An information theoretic approach", Modeling decisions for artificial intelligence. 9th international conference, mdai ,Springer,2012 (Vol. 7647, pp. 173-184 )
  • D. Sánchez, M. Batet, A. Viejo, "Automatic general-purpose sanitization of textual documents", IEEE Transactions on Information Forensics and Security 8 (2013) 853–862.
  • C. Cumby and R. Ghan, "A machine learning based system for semi-automatically redacting documents," in Proc. 23rd Innovative Application of Artificial Intelligence Conf. , 2011, pp. 1628–1635.
  • B. Anandan, C. Clifton, W. Jiang, M. Murugesan, P. Pastrana-Camacho, and L. Si, "t-plausibility: Generalizing words to desensitize text," Trans. Data Privacy, vol. 5, pp. 505–534, 2012.
  • D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial
  • DARPA, New Technologies to Support Declassification Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73, 2010. .
  • S. M. Meystre, F. J. Friedlin, B. R. South, S. Shen, and M. H. Samore, "Automatic de-identification of textual documents in the electronic health record: A review of recent research," BMC Med. Res. Methodology, vol. 10, pp. 70–86, 2010
  • ] Nat. Security Agency, Redacting With Confidence: How to Safely Publish Sanitized Reports Converted From Word to pdf, Tech. Rep. I333- 015R-2005, 2005.
  • L. Sweeney, "Replacing personally-identifying information in medical records, the scrub system," in Proc. 1996 American Medical Informatics Association Ann. Symp. , 1996, pp. 333–337.
  • M. M. Douglass, G. D. Cliffford, A. Reisner, W. J. Long, G. B. Moody, and R. G. Mark, "De-identification algorithm for free-text nursing notes," Proc. Computers in Cardiology'05, pp. 331–334, 2005.
  • V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania, "Efficient techniques for document sanitization," in Proc. ACM Conf. Information and Knowledge Management'08, 2008, pp. 843–852
  • D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial Intelligence'11, 2011, pp. 235–246.