Call for Paper - August 2019 Edition
IJCA solicits original research papers for the August 2019 Edition. Last date of manuscript submission is July 20, 2019. Read More

An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences

International Journal of Computer Applications
© 2010 by IJCA Journal
Number 2 - Article 1
Year of Publication: 2010
Smitha Sunil Kumaran Nair
N. V. Subba Reddy
Hareesha K. S

Smitha Sunil Kumaran Nair, Subba N V Reddy and Hareesha K S. Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences. International Journal of Computer Applications 8(2):1–6, October 2010. Published By Foundation of Computer Science. BibTeX

	author = {Smitha Sunil Kumaran Nair and N. V. Subba Reddy and Hareesha K. S},
	title = {Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences},
	journal = {International Journal of Computer Applications},
	year = {2010},
	volume = {8},
	number = {2},
	pages = {1--6},
	month = {October},
	note = {Published By Foundation of Computer Science}


Amyloidogenic regions in polypeptide chains are associated with a number of diseases. Experimental evidence is compelling in favor of the hypothesis that small segments of proteins are responsible for its amyloidogenic behavior. Thus, identifying these short peptides is critical for understanding diseases associated with protein misfolding and developing sequence-targeted anti-aggregation drugs. The in silico approaches using phenomenological models based on bio-physio-chemical properties of amino acids suffer from “curse of dimensionality”. Therefore, before adopting standard classification algorithms to predict such fibril motifs, the “curse of dimensionality” needs to be solved. The present study evaluates the performance of feature selection algorithms namely filter, wrapper and embedded models in conjunction with Support Vector Machine classifier. We also propose a novel integrated feature selection strategy based on Genetic Algorithm and Support Vector Machine to get an optimal number of features in predicting the amyloid fibril-forming short stretches of peptides. In addition, we investigated the performances of feature selection models that resulted in new and complementary set of properties and concludes that the proposed integrated dimensionality reduction technique outperforms all other methods and achieves the highest sensitivity and specificity of 86% and 82% respectively.


  • Amedeo Caflisch. 2007. Computational models for the prediction of polypeptide aggregation propensity, Current Opinion in Chemical Biology. ScienceDirect. 10: 437-444.
  • Natalia Sánchez de Groot, Irantzu Pallarés, Francesc X Avilés, Josep Vendrell, and Salvador Ventura. 2005. Prediction of "hot spots" of aggregation in disease-linked polypeptides. BMC Structural Biology. 5:18, doi: 10. 1186/1472-6807-5-18.
  • Amol P. Pawar, Kateri F. Dubay, Jesus Zurdo, Fabrizio Chiti, Michele Vendruscolo and Christopher M. Dobson. 2005. Prediction of “Aggregation-prone” and “Aggregation-susceptible” Regions in Proteins Associated with Neurodegenerative Diseases. J. Mol. Bio. 350, pp. 379-392.
  • Jian Tian, Ningfeng Wu, Jun Guo and Yunliu Fan. 2009. Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics. 10 (Suppl 1): S45.
  • Manuela Lopez de la Paz and Luis Serrano. 2004. Sequence determinants of amyloid fibril formation. PNAS. Vol. 101, No. 1, pp. 87-92.
  • Zhuqing Zhang, Hao Chen and Luhua La. 2007. Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential. Structural Bioinformatics. Vol. 23 no. 17, pp. 2218–2225.
  • Christopher J. C. Burges. 1998. A tutorial on Support Vector Machines for pattern recognition. Data Mining and Knowledge Discovery. 2(2), pp. 955-974.
  • Michael J. Thompson, Stuart A. Sievers, John Karanicolas, Magdalena I. Ivanova, David Baker. 2006. The 3D profile method for identifying fibril-forming segments of proteins. PNAS. Vol. 103, No. 11, pp. 4074–4078.
  • Kawashima S, Kanehisa M. 2008. AAindex: amino acid index database. Nucleic Acids Res. 28(1): 374.
  • Jiawei Han, Micheline Kamber. 2008. Data Mining – Concepts and Techniques, Elsevier, II Edition.
  • Laskko T A, Bhagwat J G, Zou K H, Ohno Machado L. 2005. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 38: 404-415.
  • Kudo M, Sklansky J. 2000. Comparison of algorithms that select features for pattern recognition. Pattern Recognition. 33(1): 25-41.
  • Ferri F J, Pudil P, Hatef M, Kittler J. 1994. Comparative study of techniques for large-scale feature selection. Pattern Recognition in Practice IV, Elsevier. pp. 403-413.
  • Yvan Saeys, Inaki Inza, Pedro Larran. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics. Vol. 23 no. 19, pp. 2507–2517.
  • Andries P. Engelbrecht. 2007. Computational Intelligence. John Wiley & Sons Ltd. Publishers, II Ed.
  • Pierre Baldi, Soren Brunak, Yves Chauvin, Claus A F Anderson, Henrick Nielson. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. Vol 16, No. 5.
  • Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S. 2010. Computational models for the prediction of amyloid fibril forming protein segments. Proc. Int’l Conference on Bioinformatics and Systems Biology, Annamalai University, Vol. 1, pp.152-157.
  • Kimon K Frousios, Vassiliki A Iconomidou, Carolina-Maria Karletidi, Stavros J Hamodrakas. 2009. Amyloidogenic deteminants are usually not buried. BMC Structural Biology. 9:44.
  • Oxana V. Galzitskaya, sergiy O. Garbuzynskiy, Michail Yurievich Lobanov. 2006. Prediction of Amyloidigenic and Disordered Regions in Protein Chains. PLoS Computational Biology. Volume 2, Issue 12, e177.
  • Magdalena I. Ivanova, Michael J. Thompson, and David Eisenberg. 2006. A systematic screen of β2-microglobulin and insulin for amyloid-like segments. PNAS. Vol. 103, No. 11, pp. 4079–4082.
  • Ana-Maria Fernandez-Escamilla, Frederic Rousseau, Joost Schymkowitz & Luis Serrano. 2004. Prediction of sequence-dependent and mutational effects on the aggreg-ation of peptides and proteins. Nature Biotechnology. Vol. 22, No. 10, pp. 1302-1306.
  • Nina Zhou and Lipo Wang. 2007. A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data: Geno. Prot. Bioinfo. Vol. 5. No. 3-4, pp. 242-249.
  • Oscar Conchillo-Sole, Natalia S de Groot, Francesc X Aviles, Josep Vendrell, Xavier Daura and Salvador Ventura. 2010. AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics. 8:65.
  • Susan Idicula-Thomas and Petety V Balaji. 2005. Understanding the relationship between the primary structure of proteins and their amyloidogenic propensity: clues from inclusion body formation. Journal of Protein Engineering, Design & Selection. Vol. 18, No. 4, pp. 175-180.
  • Mathura & Kolippakkam. 2005. APDbase: Amino acid Physico¬chemical properties Database. Bioinformation. 1(1): 2-4.
  • Sergiy O. Garbuzynskiy, Michail Yu. Lobanov and Oxana V. Galzitskaya. 2010. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Structural Bioinformatics. Vol. 26, No.3, pp.326-332.
  • Sukjoon Yoon, William J. Welsh. 2004. Detecting hidden sequence propensity for amyloid fibril formation. Protein Science. 13: 2149-2160.
  • Ilya Levner. 2005. Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics. 6:68.
  • Sanghamitra Bandyopadhyay, Ramkrishna Mitra. 2009. TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative example. Bioinformatics. Vol. 25, No. 20, pp. 2625-2631.
  • Shipin Lv, Xiukun Wang, Yifen Cui, Jue Jin, Yan Sun, Yiyuan Tang, Ying Bai, Yan Wang, Li Zhou. 2010. Application of attention network test and demographic information to detect mild cognitive impairment via combining feature selection with support vector machine. Computer Methods and programs in Biomedicine 97, Elsevier. pp. 11-18.