predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue

Md. Al Mehedi Hasan; Shamim Ahmad

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Computation (Abacus) Aspects of the Sahasralingam

Jun

2016

Design and Implementation of Photo Voltaic System: Arduino Approach

August

2013

A Review of the Effective Techniques of Compression in Medical Image Processing

July

2014

Performance Comparisons of Novel Feature Vector Selection Methods for Iris Recognition

July

2012

Reseach Article

predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue

by Md. Al Mehedi Hasan, Shamim Ahmad

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 182 - Number 15

Year of Publication: 2018

Authors: Md. Al Mehedi Hasan, Shamim Ahmad

10.5120/ijca2018917787

Md. Al Mehedi Hasan, Shamim Ahmad . predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue. International Journal of Computer Applications. 182, 15 ( Sep 2018), 8-13. DOI=10.5120/ijca2018917787

@article{ 10.5120/ijca2018917787,

author = { Md. Al Mehedi Hasan, Shamim Ahmad },

title = { predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue },

journal = { International Journal of Computer Applications },

issue_date = { Sep 2018 },

volume = { 182 },

number = { 15 },

month = { Sep },

year = { 2018 },

issn = { 0975-8887 },

pages = { 8-13 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume182/number15/29937-2018917787/ },

doi = { 10.5120/ijca2018917787 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:11:28.674349+05:30

%A Md. Al Mehedi Hasan

%A Shamim Ahmad

%T predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue

%J International Journal of Computer Applications

%@ 0975-8887

%V 182

%N 15

%P 8-13

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The lysine succinylation is found as an important post-translational modification where succinyle group is added to a lysine (K) residue of a protein molecule. It plays major role not only in regulating the cellular processes but also associated with some diseases. As a result, it requires an easiest way to detect succinylation modification in proteins. However, since the experimental technologies are costly and time-consuming, so it is quite hard to detect the succinylation modification timely at low cost to face the explosive growth of protein sequences in postgenomic age. In this context, an accurate computational method for predicting succinylation sites is an urgent issue which can be useful for drug development. In this study, a novel computational tool termed predSucc-Site has been developed to predict protein succinylation sites by (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing the effect of skewed training dataset by Different Error Costs (DEC) method, and (3) constructing a predictor using support vector machine as classifier. The experimental result shows that the predSucc-Site predictor achieves an average AUC (area under curve) score of 0.97 in predicting lysine succinylation sites. All of the experimental results along with AUC of our system are found from the average of 5 complete runs of the 5-fold cross-validation and those results indicate significantly better performance of predSucc-Site than existing predictors. A user-friendly web server for the predSucc-Site is available at http://research.ru.ac.bd/predSucc-Site/

References

Xu, Y., Ding, J., Wu, L. Y., Chou, K. C., 2013. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One, 8(2), e55844.
Walsh, C. T., Garneau‐Tsodikova, S., Gatto, G. J., 2005. Protein posttranslational modifications: the chemistry of proteome diversifications. Angewandte Chemie International Edition, 44(45), 7342-7372.
Witze, E. S., Old, W. M., Resing, K. A., & Ahn, N. G. (2007). Mapping protein post-translational modifications with mass spectrometry. Nature Methods, 4(10), 798-806.
Zhang, Z., Tan, M., Xie, Z., Dai, L., Chen, Y., Zhao, Y., 2011. Identification of lysine succinylation as a new post-translational modification. Nature Chemical Biology, 7(1), 58-63.
Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K. C., 2016. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 497, 48-56.
Xie, Z., Dai, J., Dai, L., Tan, M., Cheng, Z., Wu, Y., Zhao, Y., 2012. Lysine succinylation and lysine malonylation in histones. Molecular & Cellular Proteomics, 11(5), 100-107.
Zhao, X., Ning, Q., Chai, H., Ma, Z., 2015. Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique. Journal of Theoretical Biology, 374, 60-65.
Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K. C., 2016. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. Journal of Theoretical Biology, 394, 223-230.
Hasan, M. M., Yang, S., Zhou, Y., Mollah, M. N. H., 2016. SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Molecular BioSystems, 12(3), 786-795.
Xu, H. D., Shi, S. P., Wen, P. P., Qiu, J. D., 2015. SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics, 31(23), 3748-3750.
Liu, Z., Xiao, X., Qiu, W. R., Chou, K. C., 2015. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Analytical Biochemistry, 474, 69-77.
Sun, Y., Wong, A. K., Kamel, M. S., 2009. Classification of imbalanced data: a review. International Journal of Pattern Recognition and Artificial Intelligence, 23(04), 687-719.
Xiao, X., Min, J. L., Lin, W. Z., Liu, Z., Cheng, X., Chou, K. C., 2015. iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. Journal of Biomolecular Structure and Dynamics, 33(10), 2221-2233.
Veropoulos, K., Campbell, C., Cristianini, N., 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55-60.
Hasan, M. A. M., Li, J., Ahmad, S., Molla, M. K. I., 2017. predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue. Analytical biochemistry, 525, 107-113.
Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. iMulti-HumPhos: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems.
Chou, K. C., 1993. A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. Journal of Biological Chemistry, 268(23), 16938-16948.
Xu, Y., Ding, Y. X., Ding, J., Lei, Y. H., Wu, L. Y., Deng, N. Y., 2015. iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Scientific Reports, 5.
Chou, K. C., 2011. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 273(1), 236-247.
Liu, Z., Wang, Y., Gao, T., Pan, Z., Cheng, H., Yang, Q., ..., Xue, Y., 2014. CPLM: a database of protein lysine modifications. Nucleic Acids Research, 42(D1), D531-D536.
UniProt Consortium., 2010. The universal protein resource (UniProt) in 2010. Nucleic acids research, 38(suppl 1), D142-D148.
Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21,10–19..
Chou, K. C., 1996. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 233(1), 1-14.
Chou, K. C., 1995. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Science, 4(7), 1365-1383.
Xu, Y., Shao, X. J., Wu, L. Y., Deng, N. Y., Chou, K. C., 2013. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 1, e171.
Xu, Y., Wen, X., Shao, X. J., Deng, N. Y., Chou, K. C., 2014. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 15(5), 7594-7610.
Chou, K. C., 2000. Prediction of tight turns and their types in proteins. Analytical Biochemistry, 286(1), 1-16.
Xu, Y., Wen, X., Wen, L. S., Wu, L. Y., Deng, N. Y., Chou, K. C., 2014. iNitro-Tyr: Prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PloS one, 9(8), e105018.
Scholkopf, B., Smola, A. J., 2001. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. Protein subcellular localization prediction using multiple kernel learning based support vector machine. Molecular BioSystems, 13(4), 785-795.
Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. Protein Subcellular Localization Prediction using Support Vector Machine with the Choice of Proper Kernel", BioTechnologia vol. 98(2), 85-96.
Chou, K. C., Shen, H. B., 2007. Recent progress in protein subcellular location prediction. Analytical Biochemistry, 370(1), 1-16.
Xue, Y., Zhou, F., Fu, C., Xu, Y., Yao, X., 2006. SUMOsp: a web server for sumoylation site prediction. Nucleic Acids Research, 34(suppl 2), W254-W257.
Chen, Y. Z., Chen, Z., Gong, Y. A., Ying, G., 2012. SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties. PLoS One, 7(6), e39195.
Chen, J., Liu, H., Yang, J., Chou, K. C., 2007. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids, 33(3), 423-428.
Tang, H., Zou, P., Zhang, C., Chen, R., Chen, W., Lin, H., 2016. Identification of apolipoprotein using feature selection technique. Scientific Reports, 6.
Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
Davis, J., Goadrich, M., 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233-240). ACM.
Chen, W., Lin, H., Chou, K. C., 2015. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular BioSystems, 11(10), 2620-2634.
Chou, K. C., 2015. Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 11(3), 218-234.

Index Terms

Computer Science

Information Sciences

Keywords

Lysine Succinylation Sites Prediction Sequence-coupling Model General PseAAC Data Imbalance Issue Support Vector Machine