Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences

Smitha Sunil Kumaran Nair; N. V. Subba Reddy; Hareesha K. S

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences

by Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 8 - Number 2

Year of Publication: 2010

Authors: Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S

10.5120/1189-1661

Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S . Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences. International Journal of Computer Applications. 8, 2 ( October 2010), 1-6. DOI=10.5120/1189-1661

@article{ 10.5120/1189-1661,

author = { Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S },

title = { Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences },

journal = { International Journal of Computer Applications },

issue_date = { October 2010 },

volume = { 8 },

number = { 2 },

month = { October },

year = { 2010 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume8/number2/1189-1661/ },

doi = { 10.5120/1189-1661 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:56:28.676450+05:30

%A Smitha Sunil Kumaran Nair

%A N. V. Subba Reddy

%A Hareesha K. S

%T Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences

%J International Journal of Computer Applications

%@ 0975-8887

%V 8

%N 2

%P 1-6

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Amyloidogenic regions in polypeptide chains are associated with a number of diseases. Experimental evidence is compelling in favor of the hypothesis that small segments of proteins are responsible for its amyloidogenic behavior. Thus, identifying these short peptides is critical for understanding diseases associated with protein misfolding and developing sequence-targeted anti-aggregation drugs. The in silico approaches using phenomenological models based on bio-physio-chemical properties of amino acids suffer from “curse of dimensionality”. Therefore, before adopting standard classification algorithms to predict such fibril motifs, the “curse of dimensionality” needs to be solved. The present study evaluates the performance of feature selection algorithms namely filter, wrapper and embedded models in conjunction with Support Vector Machine classifier. We also propose a novel integrated feature selection strategy based on Genetic Algorithm and Support Vector Machine to get an optimal number of features in predicting the amyloid fibril-forming short stretches of peptides. In addition, we investigated the performances of feature selection models that resulted in new and complementary set of properties and concludes that the proposed integrated dimensionality reduction technique outperforms all other methods and achieves the highest sensitivity and specificity of 86% and 82% respectively.

References

Amedeo Caflisch. 2007. Computational models for the prediction of polypeptide aggregation propensity, Current Opinion in Chemical Biology. ScienceDirect. 10: 437-444.
Natalia Sánchez de Groot, Irantzu Pallarés, Francesc X Avilés, Josep Vendrell, and Salvador Ventura. 2005. Prediction of "hot spots" of aggregation in disease-linked polypeptides. BMC Structural Biology. 5:18, doi: 10. 1186/1472-6807-5-18.
Amol P. Pawar, Kateri F. Dubay, Jesus Zurdo, Fabrizio Chiti, Michele Vendruscolo and Christopher M. Dobson. 2005. Prediction of “Aggregation-prone” and “Aggregation-susceptible” Regions in Proteins Associated with Neurodegenerative Diseases. J. Mol. Bio. 350, pp. 379-392.
Jian Tian, Ningfeng Wu, Jun Guo and Yunliu Fan. 2009. Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics. 10 (Suppl 1): S45.
Manuela Lopez de la Paz and Luis Serrano. 2004. Sequence determinants of amyloid fibril formation. PNAS. Vol. 101, No. 1, pp. 87-92.
Zhuqing Zhang, Hao Chen and Luhua La. 2007. Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential. Structural Bioinformatics. Vol. 23 no. 17, pp. 2218–2225.
Christopher J. C. Burges. 1998. A tutorial on Support Vector Machines for pattern recognition. Data Mining and Knowledge Discovery. 2(2), pp. 955-974.
Michael J. Thompson, Stuart A. Sievers, John Karanicolas, Magdalena I. Ivanova, David Baker. 2006. The 3D profile method for identifying fibril-forming segments of proteins. PNAS. Vol. 103, No. 11, pp. 4074–4078.
http://www.ebi.ac.uk/uniprot/database/download.html
Kawashima S, Kanehisa M. 2008. AAindex: amino acid index database. Nucleic Acids Res. 28(1): 374.
Jiawei Han, Micheline Kamber. 2008. Data Mining – Concepts and Techniques, Elsevier, II Edition.
Laskko T A, Bhagwat J G, Zou K H, Ohno Machado L. 2005. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 38: 404-415.
Kudo M, Sklansky J. 2000. Comparison of algorithms that select features for pattern recognition. Pattern Recognition. 33(1): 25-41.
Ferri F J, Pudil P, Hatef M, Kittler J. 1994. Comparative study of techniques for large-scale feature selection. Pattern Recognition in Practice IV, Elsevier. pp. 403-413.
Yvan Saeys, Inaki Inza, Pedro Larran. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics. Vol. 23 no. 19, pp. 2507–2517.
Andries P. Engelbrecht. 2007. Computational Intelligence. John Wiley & Sons Ltd. Publishers, II Ed.
http://www.csie.ntu.edu.tw/~cjlin/
Pierre Baldi, Soren Brunak, Yves Chauvin, Claus A F Anderson, Henrick Nielson. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. Vol 16, No. 5.
Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S. 2010. Computational models for the prediction of amyloid fibril forming protein segments. Proc. Int’l Conference on Bioinformatics and Systems Biology, Annamalai University, Vol. 1, pp.152-157.
Kimon K Frousios, Vassiliki A Iconomidou, Carolina-Maria Karletidi, Stavros J Hamodrakas. 2009. Amyloidogenic deteminants are usually not buried. BMC Structural Biology. 9:44.
Oxana V. Galzitskaya, sergiy O. Garbuzynskiy, Michail Yurievich Lobanov. 2006. Prediction of Amyloidigenic and Disordered Regions in Protein Chains. PLoS Computational Biology. Volume 2, Issue 12, e177.
Magdalena I. Ivanova, Michael J. Thompson, and David Eisenberg. 2006. A systematic screen of β2-microglobulin and insulin for amyloid-like segments. PNAS. Vol. 103, No. 11, pp. 4079–4082.
Ana-Maria Fernandez-Escamilla, Frederic Rousseau, Joost Schymkowitz & Luis Serrano. 2004. Prediction of sequence-dependent and mutational effects on the aggreg-ation of peptides and proteins. Nature Biotechnology. Vol. 22, No. 10, pp. 1302-1306.
Nina Zhou and Lipo Wang. 2007. A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data: Geno. Prot. Bioinfo. Vol. 5. No. 3-4, pp. 242-249.
http://antares.protres.ru/fold-amyloid/
Oscar Conchillo-Sole, Natalia S de Groot, Francesc X Aviles, Josep Vendrell, Xavier Daura and Salvador Ventura. 2010. AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics. 8:65.
Susan Idicula-Thomas and Petety V Balaji. 2005. Understanding the relationship between the primary structure of proteins and their amyloidogenic propensity: clues from inclusion body formation. Journal of Protein Engineering, Design & Selection. Vol. 18, No. 4, pp. 175-180.
Mathura & Kolippakkam. 2005. APDbase: Amino acid Physico¬chemical properties Database. Bioinformation. 1(1): 2-4.
http://www.expasy.org/tools/protscale.html
http://www.rfdn.org/bioinfo/APDbase.php
Sergiy O. Garbuzynskiy, Michail Yu. Lobanov and Oxana V. Galzitskaya. 2010. FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence. Structural Bioinformatics. Vol. 26, No.3, pp.326-332.
http://biophysics.biol.uoa.gr/AMYLPRED/input.html
Sukjoon Yoon, William J. Welsh. 2004. Detecting hidden sequence propensity for amyloid fibril formation. Protein Science. 13: 2149-2160.
Ilya Levner. 2005. Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics. 6:68.
Sanghamitra Bandyopadhyay, Ramkrishna Mitra. 2009. TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative example. Bioinformatics. Vol. 25, No. 20, pp. 2625-2631.
Shipin Lv, Xiukun Wang, Yifen Cui, Jue Jin, Yan Sun, Yiyuan Tang, Ying Bai, Yan Wang, Li Zhou. 2010. Application of attention network test and demographic information to detect mild cognitive impairment via combining feature selection with support vector machine. Computer Methods and programs in Biomedicine 97, Elsevier. pp. 11-18.

Index Terms

Computer Science

Information Sciences

Keywords

Amyloid fibril physicochemical properties Genetic Algorithm Support Vector Machine