An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor

Hiral Padhiyar; Purvi Rekh

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Comparison of Preprocessing Algorithms using an Affordable EEG Headset

Feb

2017

Impact of Mobility on Energy Consumption of AODV Protocol for Routing in Mobile Ad Hoc Networks

Oct

2016

Performance Evaluation and Comparison of PDTMRP and MAODV

May

2015

Development of Kannada Speech Corpus for Continuous Speech Recognition

Jun

2018

Reseach Article

An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor

by Hiral Padhiyar, Purvi Rekh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 101 - Number 6

Year of Publication: 2014

Authors: Hiral Padhiyar, Purvi Rekh

10.5120/17689-8652

Hiral Padhiyar, Purvi Rekh . An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor. International Journal of Computer Applications. 101, 6 ( September 2014), 7-11. DOI=10.5120/17689-8652

@article{ 10.5120/17689-8652,

author = { Hiral Padhiyar, Purvi Rekh },

title = { An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor },

journal = { International Journal of Computer Applications },

issue_date = { September 2014 },

volume = { 101 },

number = { 6 },

month = { September },

year = { 2014 },

issn = { 0975-8887 },

pages = { 7-11 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume101/number6/17689-8652/ },

doi = { 10.5120/17689-8652 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:31:57.232437+05:30

%A Hiral Padhiyar

%A Purvi Rekh

%T An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor

%J International Journal of Computer Applications

%@ 0975-8887

%V 101

%N 6

%P 7-11

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the development of Internet and the emergence of a large number of text resources, the automatic text classification has become a research hotspot. Emails is one of the fastest and cheapest communication ways that today it has became the part of communication means of millions of people. It has become a part of everyday life for millions of people, changing the way we work and collaborate. The large percentage of the total traffic over the internet is the email. Email data is also growing rapidly, creating needs for automated analysis. In many security informatics applications it is important to detect deceptive communication in email. In the iterative process in the standard EM-based semi-supervised learning, there are two steps: firstly, use the current classifier constructed in the previous iteration to predict the labels of all unlabeled samples; then, reconstruct a new classifier based on the new training samples set. In this work, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian is proposed in which unlabeled documents are divided into two parts, reliable and misclassified. An Ensemble technique is used to add only reliable unlabeled documents to the training set. Also preprocessing of unlabelled documents is performed before learning process of Naïve Bayesian and K-NN classifiers during first step of EM to reduce time of preprocessing, so with this proposed work accuracy of classifier will be increased and execution time will be decreased.

References

S. Appavu and R. Rajaram, "Learning to classifying threaten email", 2008 IEEE.
Lei SHI, Qiang WANG "Spam e-mail classification using Decesion tree Ensemble", 2012.
Xinghua Fan and Houfeng Ma, "An improved EM-based Semi-supervised learning method", 2009 IEEE.
Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.
Jun-ming Xu, Giorgio Fumera, Fabio Roli and Zhi-Hua Zhou "Training SpamAssassin with Active Semi-supervised Learning", CEAS 2009.
Haibin Mei and Minghua zhang, "A semi supervised IDS alert classification model based on alert context", ICCSEE 2013.
Ye Tian, Gary M. Weiss and Qiang Ma, "A semi-supervised approach for web spam detection using combinatorial feature-fusion", 2007.
Vinod Patidar, Divakar Singh, "A Survey on Machine Learning Methods in Spam Filtering", International Journal of Advanced Research in Computer Science and Software Engineering, Page(s): 964-972, October 2013
Jalili, S. , Bitarafan, "Increase the efficiency of text categorization based on the improved feature selection method", 2006.
MohammadReza FeiziDerakhshi and Nayer TalebiBeyrami, "The Feature Selection and Dimensionality Reduction Methods for Email Classification", Journal of Basic and Applied Scientific Research , 633-636, 2013.
Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.

Index Terms

Computer Science

Information Sciences

Keywords

Email Classification Naïve Bayes K-NN SSL.