CFP last date
20 May 2024
Reseach Article

Detection of Fraudulent Emails by Authorship Extraction

by A. Pandian, Mohamed Abdul Karim
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Number 7
Year of Publication: 2012
Authors: A. Pandian, Mohamed Abdul Karim
10.5120/5551-7619

A. Pandian, Mohamed Abdul Karim . Detection of Fraudulent Emails by Authorship Extraction. International Journal of Computer Applications. 41, 7 ( March 2012), 7-12. DOI=10.5120/5551-7619

@article{ 10.5120/5551-7619,
author = { A. Pandian, Mohamed Abdul Karim },
title = { Detection of Fraudulent Emails by Authorship Extraction },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 41 },
number = { 7 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 7-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume41/number7/5551-7619/ },
doi = { 10.5120/5551-7619 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:28:58.299059+05:30
%A A. Pandian
%A Mohamed Abdul Karim
%T Detection of Fraudulent Emails by Authorship Extraction
%J International Journal of Computer Applications
%@ 0975-8887
%V 41
%N 7
%P 7-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Fraudulent emails can be detected by extraction of authorship information from the contents of emails. This paper presents information extraction based on unique words from the emails. These unique words will be used as representative features to train Radial Basis function (RBF). Final weights are obtained and subsequently used for testing. The percentage of identification of email authorship depends upon number of RBF centers and the type of functional words used for training RBF. One hundred and fifty authors with over one hundred files from the sent folder of Enron email dataset are considered. A total of 300 unique words of number of characters in each word ranging from three to seven are considered. Training and testing of RBF are done by taking different lengths of words. Our simulation shows the effectiveness of the proposed RBF network for email authorship identification. The accuracy of authorship identification ranges from 95% to 97%.

References
  1. Abbasi A. And Chen H, "Applying Authorship Analysis to Extremist-Group Web Forum Messages" IEEE INTELLIGENT SYSTEMS, pp. 67–75, 2005.
  2. David Madigan, Alexander Genkin, David Lewis, Shlomo Argamon, Dmitriy Fradkin, and Li Ye, "Author Identification on the Large Scale", Proc. of The Meeting Of The Classification Society of North America,2005.
  3. Diederich, J. , and Chen, H. 2008. Writeprints, "A stylometric approach to identity-level identification and similarity detection", ACM Transactions on Information Systems (26:2),pp. 7.
  4. Diederich, J. , Kindermann, J. , Leopold, E. and Paass, G. (2003), "Authorship Attribution with Support Vector Machines", Applied Intelligence 19(1), pp. 109-123.
  5. Goodman R. , Hahn M. , Marella M. , Ojar C. , And Westcott S. , "The Use Of Stylometry For Email Author Identification: A Feasibility Study", Proc. Student/Faculty Research Day, CSIS, Pace University, White Plains, NY, pp. 1-7, May 2007.
  6. Klimt B. & Yang Y. , (2004). The Enron corpus: A new dataset for email classification research, In Proceedings of ECML'04, 15th European Conference on Machine Learning, pages 217-226,(2004
  7. Koppel, M. , Schler, J. , Argamon, S. and Messeri, E. , "Authorship Attribution with Thousands of Candidate Authors", in Proc. 29th ACM SIGIR Conference on Research & Development on Information Retrieval, 2006.
  8. Moshe Koppel, Shlomo Argamon, And Anat Rachel Shimoni, "Automatically Categorizing Written Texts By Author Gender", Literary And Linguistic Computation. 17(4):pp. 401-412, 2002.
  9. Pavelec, D. , Justino, E. , And Oliveira, L. S. , "Author Identification Using Stylometric Features", Inteligencia Artificial (11:36), pp. 59-65, 2007.
  10. Peng, F. , Schuurmans, D. , ,Wang, S. , "Augumenting Naive Bayes Text Classifier With Statistical Language Models , Information Retrieval", 7 (3-4), Pp. 317 – 345, 2004.
  11. Stamatatos, E. , Fakotakis, N. , & Kokkinakis, G. , (2000). Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471-495.
  12. Zheng R. , Li J. , Chen H. , Huang Z. , "A Framework For Authorship Identification Of Online Messages: Writing-Style Features And Classification Techniques", Journal of the American Society for Information Science and Technology, 57(3):378–93.
  13. Farkhund Iqbal , Hamad Binsalleeh, Benjamin C. M. Fung, Mourad Debbabi , " Mining writeprints from anonymous e-mails for forensic investigation, Digital Investigation,1 – 9 (2010) .
Index Terms

Computer Science
Information Sciences

Keywords

Email Authorship Identification Spam Word Frequency Radial Basis Function