Call for Paper - November 2020 Edition
IJCA solicits original research papers for the November 2020 Edition. Last date of manuscript submission is October 20, 2020. Read More

predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Md. Al Mehedi Hasan, Shamim Ahmad
10.5120/ijca2018917787

Md. Al Mehedi Hasan and Shamim Ahmad. predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue. International Journal of Computer Applications 182(15):8-13, September 2018. BibTeX

@article{10.5120/ijca2018917787,
	author = {Md. Al Mehedi Hasan and Shamim Ahmad},
	title = {predSucc-Site: Lysine Succinylation Sites Prediction in Proteins by using Support Vector Machine and Resolving Data Imbalance Issue},
	journal = {International Journal of Computer Applications},
	issue_date = {September 2018},
	volume = {182},
	number = {15},
	month = {Sep},
	year = {2018},
	issn = {0975-8887},
	pages = {8-13},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume182/number15/29937-2018917787},
	doi = {10.5120/ijca2018917787},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

The lysine succinylation is found as an important post-translational modification where succinyle group is added to a lysine (K) residue of a protein molecule. It plays major role not only in regulating the cellular processes but also associated with some diseases. As a result, it requires an easiest way to detect succinylation modification in proteins. However, since the experimental technologies are costly and time-consuming, so it is quite hard to detect the succinylation modification timely at low cost to face the explosive growth of protein sequences in postgenomic age. In this context, an accurate computational method for predicting succinylation sites is an urgent issue which can be useful for drug development. In this study, a novel computational tool termed predSucc-Site has been developed to predict protein succinylation sites by (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing the effect of skewed training dataset by Different Error Costs (DEC) method, and (3) constructing a predictor using support vector machine as classifier. The experimental result shows that the predSucc-Site predictor achieves an average AUC (area under curve) score of 0.97 in predicting lysine succinylation sites. All of the experimental results along with AUC of our system are found from the average of 5 complete runs of the 5-fold cross-validation and those results indicate significantly better performance of predSucc-Site than existing predictors. A user-friendly web server for the predSucc-Site is available at http://research.ru.ac.bd/predSucc-Site/

References

  1. Xu, Y., Ding, J., Wu, L. Y., Chou, K. C., 2013. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One, 8(2), e55844.
  2. Walsh, C. T., Garneau‐Tsodikova, S., Gatto, G. J., 2005. Protein posttranslational modifications: the chemistry of proteome diversifications. Angewandte Chemie International Edition, 44(45), 7342-7372.
  3. Witze, E. S., Old, W. M., Resing, K. A., & Ahn, N. G. (2007). Mapping protein post-translational modifications with mass spectrometry. Nature Methods, 4(10), 798-806.
  4. Zhang, Z., Tan, M., Xie, Z., Dai, L., Chen, Y., Zhao, Y., 2011. Identification of lysine succinylation as a new post-translational modification. Nature Chemical Biology, 7(1), 58-63.
  5. Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K. C., 2016. iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Analytical Biochemistry, 497, 48-56.
  6. Xie, Z., Dai, J., Dai, L., Tan, M., Cheng, Z., Wu, Y., Zhao, Y., 2012. Lysine succinylation and lysine malonylation in histones. Molecular & Cellular Proteomics, 11(5), 100-107.
  7. Zhao, X., Ning, Q., Chai, H., Ma, Z., 2015. Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique. Journal of Theoretical Biology, 374, 60-65.
  8. Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K. C., 2016. pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. Journal of Theoretical Biology, 394, 223-230.
  9. Hasan, M. M., Yang, S., Zhou, Y., Mollah, M. N. H., 2016. SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Molecular BioSystems, 12(3), 786-795.
  10. Xu, H. D., Shi, S. P., Wen, P. P., Qiu, J. D., 2015. SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics, 31(23), 3748-3750.
  11. Liu, Z., Xiao, X., Qiu, W. R., Chou, K. C., 2015. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Analytical Biochemistry, 474, 69-77.
  12. Sun, Y., Wong, A. K., Kamel, M. S., 2009. Classification of imbalanced data: a review. International Journal of Pattern Recognition and Artificial Intelligence, 23(04), 687-719.
  13. Xiao, X., Min, J. L., Lin, W. Z., Liu, Z., Cheng, X., Chou, K. C., 2015. iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. Journal of Biomolecular Structure and Dynamics, 33(10), 2221-2233.
  14. Veropoulos, K., Campbell, C., Cristianini, N., 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55-60.
  15. Hasan, M. A. M., Li, J., Ahmad, S., Molla, M. K. I., 2017. predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue. Analytical biochemistry, 525, 107-113.
  16. Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. iMulti-HumPhos: A Multi-Label Classifier for Identifying Human Phosphorylated Proteins Using Multiple Kernel Learning Based Support Vector Machine. Molecular BioSystems.
  17. Chou, K. C., 1993. A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. Journal of Biological Chemistry, 268(23), 16938-16948.
  18. Xu, Y., Ding, Y. X., Ding, J., Lei, Y. H., Wu, L. Y., Deng, N. Y., 2015. iSuc-PseAAC: predicting lysine succinylation in proteins by incorporating peptide position-specific propensity. Scientific Reports, 5.
  19. Chou, K. C., 2011. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 273(1), 236-247.
  20. Liu, Z., Wang, Y., Gao, T., Pan, Z., Cheng, H., Yang, Q., ..., Xue, Y., 2014. CPLM: a database of protein lysine modifications. Nucleic Acids Research, 42(D1), D531-D536.
  21. UniProt Consortium., 2010. The universal protein resource (UniProt) in 2010. Nucleic acids research, 38(suppl 1), D142-D148.
  22. Chou, K.C., 2005. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21,10–19..
  23. Chou, K. C., 1996. Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 233(1), 1-14.
  24. Chou, K. C., 1995. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Science, 4(7), 1365-1383.
  25. Xu, Y., Shao, X. J., Wu, L. Y., Deng, N. Y., Chou, K. C., 2013. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 1, e171.
  26. Xu, Y., Wen, X., Shao, X. J., Deng, N. Y., Chou, K. C., 2014. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences, 15(5), 7594-7610.
  27. Chou, K. C., 2000. Prediction of tight turns and their types in proteins. Analytical Biochemistry, 286(1), 1-16.
  28. Xu, Y., Wen, X., Wen, L. S., Wu, L. Y., Deng, N. Y., Chou, K. C., 2014. iNitro-Tyr: Prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PloS one, 9(8), e105018.
  29. Scholkopf, B., Smola, A. J., 2001. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
  30. Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. Protein subcellular localization prediction using multiple kernel learning based support vector machine. Molecular BioSystems, 13(4), 785-795.
  31. Hasan, M. A. M., Ahmad, S., Molla, M. K. I., 2017. Protein Subcellular Localization Prediction using Support Vector Machine with the Choice of Proper Kernel", BioTechnologia vol. 98(2), 85-96.
  32. Chou, K. C., Shen, H. B., 2007. Recent progress in protein subcellular location prediction. Analytical Biochemistry, 370(1), 1-16.
  33. Xue, Y., Zhou, F., Fu, C., Xu, Y., Yao, X., 2006. SUMOsp: a web server for sumoylation site prediction. Nucleic Acids Research, 34(suppl 2), W254-W257.
  34. Chen, Y. Z., Chen, Z., Gong, Y. A., Ying, G., 2012. SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties. PLoS One, 7(6), e39195.
  35. Chen, J., Liu, H., Yang, J., Chou, K. C., 2007. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids, 33(3), 423-428.
  36. Tang, H., Zou, P., Zhang, C., Chen, R., Chen, W., Lin, H., 2016. Identification of apolipoprotein using feature selection technique. Scientific Reports, 6.
  37. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
  38. Davis, J., Goadrich, M., 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233-240). ACM.
  39. Chen, W., Lin, H., Chou, K. C., 2015. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Molecular BioSystems, 11(10), 2620-2634.
  40. Chou, K. C., 2015. Impacts of bioinformatics to medicinal chemistry. Medicinal Chemistry, 11(3), 218-234.

Keywords

Lysine Succinylation Sites Prediction, Sequence-coupling Model, General PseAAC, Data Imbalance Issue, Support Vector Machine