CFP last date
20 June 2024
Reseach Article

Authorship Analysis Studies: A Survey

by Sara El Manar El Bouanani, Ismail Kassou
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 86 - Number 12
Year of Publication: 2014
Authors: Sara El Manar El Bouanani, Ismail Kassou

Sara El Manar El Bouanani, Ismail Kassou . Authorship Analysis Studies: A Survey. International Journal of Computer Applications. 86, 12 ( January 2014), 22-29. DOI=10.5120/15038-3384

@article{ 10.5120/15038-3384,
author = { Sara El Manar El Bouanani, Ismail Kassou },
title = { Authorship Analysis Studies: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 86 },
number = { 12 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 22-29 },
numpages = {9},
url = { },
doi = { 10.5120/15038-3384 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:04:03.836986+05:30
%A Sara El Manar El Bouanani
%A Ismail Kassou
%T Authorship Analysis Studies: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 86
%N 12
%P 22-29
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

The objective in this paper is to provide a review of the different studies done on authorship analysis. Focus is on outlining the stylometricStylometric features that allow distinguishing between authors and on listing the diverse techniques used to classify an author's texts.

  1. A. Abbasi, H. Chen 2005. Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems, 20(5), pp : 67-75.
  2. A. Abbasi, H. Chen 2006. Visualizing Authorship for Identification. ISI, LNCS 3975, pp : 60-71.
  3. A. Abbasi, H. Chen 2008. Writeprints: A Stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems, 26(2), pp : 1-29.
  4. A. Abbasi, H. Chen, J. Nunamaker 2008. Stylometric identification in electronic markets: Scalability and robustness. Journal of Management Information Systems, 5(1), pp : 49-78.
  5. D. Abbott, M. J. Berryman, S. Jain, T. J. Putnins, D. J. Signoriello 2006. Advanced text authorship detection methods and their application to biblical texts. The International Society for Optical Engineering, pp : 1-13.
  6. M. Amasyali, B. Diri 2003. Automatic Author Detection for Turkish Texts. ICANN2003
  7. E. Amitay, S. Yogev, E. Yom-Tov 2007. Serial Sharers: Detecting Split Identities of Web Authors. Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection. ACM SIGIR Amsterdam.
  8. A. Anderson, M. Corney, O. DeVel, G. Mohay 2001. Mining E-mail Content for Author Identification Forensics. SIGMOD Record, 30(4), pp : 55-64.
  9. A. Anderson, M. Corney, G. Mohay, O. DeVel 2002. Gender-preferential text mining of e-mail discourse. In ACSAC'02: Proc. of the 18th Annual Computer Security Applications Conference, Washington, DC, pp : 21-27.
  10. S. Argamon, M. Koppel, A. R. Shimoni 2002. Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), pp : 401-412.
  11. S. Argamon, M. Saric, S. Stein 2003. Style mining of electronic messages for multiple authorship discrimination: First results. In Proceedings of the 9th ACM SIGKDD, pp : 475-480.
  12. S. Argamon, D. Fradkin, A. Genkin, D. Lewis, D. Madigan, L. Ye 2005. Author identification on the large scale. In Proceedings of CSNA-05.
  13. S. Argamon, S. Levitan 2005. Measuring the usefulness of function words for authorship attribution. Proceedings of the 2005 ACH/ALLC Conference
  14. S. Argamon, M. Koppel, J. Schler 2009. Computational methods in authorship attribution". J. Am. Soc. Inf. Sci. Technol. , 60(1), pp : 9-26.
  15. C. Aykanat, B. B. Cambazoglu, F. Can, T. Kucukyilmaz 2008. Chat mining: predicting user and message attributes in computer-mediated communication. Information Processing and Management, 44(4), pp : 1448-1466.
  16. R. Baayen, A. Neijt, F. Tweedie, H. VanHalteren 2002. An experiment in authorship attribution. In proceedings of the 6th International Conference on the Statistical Analysis of Textual Data (JADT).
  17. G. Binongo, J. Nilo 2002. Who Wrote the 15th Book of Oz? An Application of Multivariate Analysis to Authorship Attribution. Conference of the Classification Society of North America.
  18. R. Bosch, J. Smith 1998. Separating hyper planes and the authorship of the disputed federalist papers. American Mathematical Monthly, 105(7), pp : 601-608.
  19. J. Burrows 2002. Delta: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), pp : 267-287.
  20. N. Cerconey, V. Keselj, F. Peng, C. Thomas 2003. N-Gram based author profiles for authorship attribution, Pacific Association for Computational Linguistics.
  21. R. Chandrasekaran, G. Manimannan 2012. Use of Probabilistic Neural Network In The Classification Of Articles Of Ambiguous Authorship. International Journal of Engineering Research & Technology (IJERT),Vol. 1 Issue 7.
  22. C. Chaski 2005. Who's at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence. Vol. 4, Issue 1.
  23. M. Chaurasia, H. F. Hassan 2011. Author Assertion of Furtive Write Print Using Character N-Grams. International Conference on Future Information Technology IPCSIT vol. 13.
  24. H. Chen, Z. Huang, Y. Qin, R. Zheng 2003. Authorship Analysis in Cybercrime Investigation (Eds. ): ISI 2003, LNCS 2665, pp : 59-73.
  25. H. Chen, J. Li, R. Zheng. 2006. From fingerprint to writeprint. Communications of the ACM - April 2006/Vol. 49, No. 4.
  26. H. Chen, Z. Huang, J. Li, R. Zheng 2006. A framework for authorship Identification of Online Messages: writing-Style features and classification Techniques. JASIST, pp : 378-393.
  27. R. Cook, W. P. Oman. Programming style authorship analysis. In the proceeding of the 17th annual ACM computer Science Conference, pp : 320-326.
  28. M. Debbabi, B. C. M. Fung, R. Hadjidj, F. Iqbal 2008. A novel approach of mining write-prints for authorship attribution in e-mail forensics. digital investigation 5, pp : 42-51.
  29. M. Debbabi, B. C. M. Fung, F. Iqbal, L. A. Khan 2010. E-mail Authorship Verification for Forensic Investigation. SAC'10 March 22-26, 2010, Sierre, Switzerland. Copyright 2010 ACM 978-1-60558-638-0/10/03.
  30. O. DeVel, 2000. Mining e-mail authorship. In Proceeding of the Workshop on text mining in ACM international conference on knowledge discovery and data mining.
  31. E. Dokow, M. Koppel, J. Schler, 2007. Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research, 8, pp : 1261-1276.
  32. E. Ekinci, H. Takç? 2012. Character Level Authorship Attribution for Turkish Text Documents, TOJSAT: The Online Journal of Science and Technology- July 2012, Vol. 2, Issue 3.
  33. W. Elliot, R. Valenza 1991. Was the Earl of Oxford the true Shakespeare? Notes and Queries, 38, pp : 501-506.
  34. N. Fakotakis, G. Kokkinakis, E. Stamatatos 1999. Automatic Authorship Attribution. Proceedings of EACL '99.
  35. N. Fakotakis, G. Kokkinakis, E. Stamatatos 2001. Automatic Text categorization in Terms of Genre and author. Computational Linguistics Vol. 26, Issue 4.
  36. N. Fakotakis, G. Kokkinakis, E. Stamatatos 2001. Computer-Based Authorship Attribution without Lexical Measures. Computers and the Humanities 35, pp : 193-214.
  37. G. Frantzeskou, S. Gritzalis, S. Katsikas, E. Stamatatos 2006. Effective identification of source code authors using byte-level information. In Proceedings of the 28th International Conference on Software Engineering, pp : 893-896.
  38. S. Guenter, C. Sanderson 2006. Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In Proceedings of the International Conference on Empirical Methods in Natural Language Engineering, pp : 482-491.
  39. D. Holmes 1992. A Stylometric Analysis of Mormon Scripture and Related Texts. Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 155, No. 1, pp : 91-120.
  40. D. Hoover 2004. Testing Burrows' Delta. Literary and Linguistic Computing, 19(4), pp : 453-475.
  41. F. Iqbal 2010. Mining writeprints from anonymous e-mails for forensic investigation. Digit Investig, doi:10. 1016/j. diin. 2010. 03. 003.
  42. F. Iqbal 2011. Messaging Forensic Framework for Cybercrime Investigation. A Thesis in the Department of Computer Science and Software Engineering - Concordia University Montréal, Canada.
  43. E. Justino, L. S. Oliveira, W. Oliveira Jr 2013. Comparing compression models for authorship attribution. Forensic Science International 228, pp : 100-104.
  44. M. Koppel, J. Schler 2003. Exploiting stylistic idiosyncrasies for authorship attribution. In Proceedings of IJCAI'03.
  45. I. Krsul, H. E. Spafford. Authorship analysis: identifying the author of a program. Computer security 16, 3, pp : 233-257.
  46. C. Labbe, D. Labbe 2001. Inter-textual distance and authorship attribution Corneille and Moliere, Journal of Quantitative Linguistics. 8-3, December 2001, pp : 213-231.
  47. M. Lai, Y. Li, J. Ma, G. Teng 2004. E-mail authorship mining based on SVM for computer forensic. In Proc. of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, China.
  48. V. Magri-Mourgues 2010. Distance intertextuelle et connexion lexicale : outils de catégorisation générique ou stylistique ? Approche expérimentale d'un corpus inédit : le corpus aragonien. JADT 2010.
  49. T. Mendenhall 1887. The characteristic curves of composition. Science, IX, pp : 237-249.
  50. F. Mosteller, D. L. Wallace 1964. Inference and disputed authorship: The Federalist. Addison-Wesley.
  51. J. Novak, P. Raghavan, A. Tomkins 2004. Anti-aliasing on the web. In Proc. of the 13th international conference on World Wide Web, pp : 30-39. ACM.
  52. F. Peng, D. Shuurmans, S. Wang 2004. Augmenting naive Bayes classifiers with statistical language models. Information Retrieval Journal, 7(1), pp : 317-345.
  53. J. Savoy 2012. Authorship attribution based on specific vocabulary. ACM Trans. Inf. Syst. 30, 2, May 2012.
  54. E. Stamatatos 2006. Authorship attribution based on feature set subspacing ensembles. International Journal on Artificial Intelligence Tools, 15(5), pp : 823-838.
  55. E. Stamatatos 2009. Intrinsic Plagiarism Detection Using Character n-gram Profiles. PAN'09, pp : 38-46.
  56. E. Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. JASIST.
  57. H. VanHaltern 2007. Author verification by linguistic profiling: An exploration of the parameter space. ACM Transactions on Speech and Language Processing.
  58. G. Yule 1938. On sentence-length as a statistical characteristic of style in prose, with application to two cases of disputed authorship. Biometrika, 30, pp : 363-390.
  59. G. Yule 1944. The statistical study of literary vocabulary. Cambridge University Press.
  60. Y. Zhao, J. Zobel 2005. Effective and scalable authorship attribution using function words. In Proceedings of the 2nd Asia Information Retrieval Symposium.
  61. G. Zipf 1932. Selected studies of the principle of relative frequency in language. Harvard University Press, Cambridge, MA.
Index Terms

Computer Science
Information Sciences


Authorship characterization authorship attribution similarity detection Stylometric features probabilistic models compression models Machine learning classifiers clustering algorithms inter-textual distance.