Implementing Protein Sequence Alignment using PAM-250 Matrices

Rajbir Singh; Sukhleen Kaur Bhasina; Dheeraj Pal Kaur

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

Multi-Band RLS Estimation with Rank Two Updates: Application to Short-Term Temperature Forecast

Alexander Stotsky

Random Articles

A Parallelized Matrix-Multiplication Implementation of Neural Network for Collision Free Robot Path Planning

May

2013

Probabilistic Neural Network with GLCM and Statistical Measurements for Increasing Accuracy of Iris Recognition System

February

2016

False Proof Reputation Management for P2P Networks

March

2012

MoMath: An Innovative Design of a Mobile based System for Supporting Primary School Mathematics in Tanzania

June

2014

Reseach Article

Implementing Protein Sequence Alignment using PAM-250 Matrices

by Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 94 - Number 11

Year of Publication: 2014

Authors: Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur

10.5120/16386-5943

Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur . Implementing Protein Sequence Alignment using PAM-250 Matrices. International Journal of Computer Applications. 94, 11 ( May 2014), 11-16. DOI=10.5120/16386-5943

@article{ 10.5120/16386-5943,

author = { Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur },

title = { Implementing Protein Sequence Alignment using PAM-250 Matrices },

journal = { International Journal of Computer Applications },

issue_date = { May 2014 },

volume = { 94 },

number = { 11 },

month = { May },

year = { 2014 },

issn = { 0975-8887 },

pages = { 11-16 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume94/number11/16386-5943/ },

doi = { 10.5120/16386-5943 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:17:21.511697+05:30

%A Rajbir Singh

%A Sukhleen Kaur Bhasina

%A Dheeraj Pal Kaur

%T Implementing Protein Sequence Alignment using PAM-250 Matrices

%J International Journal of Computer Applications

%@ 0975-8887

%V 94

%N 11

%P 11-16

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Multiple Protein Sequence is one of the most important problems in modern computational biology. The emphasis here is on the use of computers because most of the tasks involved in genomic data analysis are highly repetitive or mathematically complex. One of the largest areas of Bioinformatics and Data mining has been in the Protein Domain. These efforts have included protein Structure prediction, folding Pathway prediction, Sequence alignment, Substructure Detection and many others. Data storage became easier as the accessibility of large amount of computing power at low cost. The research in bioinformatics has accumulated large amount of data. As the hardware technology advancing, the cost of storing is decreasing. The biological data is available in different formats and is comparatively more complex. In the present work, data mining solution is provided for the problem of protein sequence alignment. Different formats of sequences are studied and plain text format is chosen for the problem under consideration. Clustering methods are based on expressing similarity or dissimilarity of such sequences. The similarity of two protein sequences can be assessed by score of the best alignment of the sequences. Scoring matrix accesses the replacement of one amino acid by another, accepted by natural selection. The replacement can be due the result of two distinct processes: i) occurrence of mutation in the portion of the gene template producing one amino acid of a protein. ii) acceptance of the mutation by the species (similar function). PAM (Accepted Point Mutations) is the scoring matrice that is used for the different computations. PAM-250 matrix is used for the problem under consideration. The matrix is frequently used to score aligned peptide sequences to determine the similarity of those sequences. The numbers given above were derived from comparing aligned sequences of proteins with known homology and determining the "accepted point mutations" (PAM) observed. Global and Local alignments are predicted along with the alignment score.

References

Bowman, M. , Debray, S. K. , and Peterson, L. L. 1993. Reasoning about naming systems. .
Pei, J. and Jiang, D. (2005), "An Interactive Approach to Mining Gene Expression Data", IEEE Transactions on knowledge and Data Engineering, vol. 17, pp. 1363-1378.
Rastogi, S. C. , Mendiratta, N. and Rastogi, P. (2005) "Bioinformatics Methods and Applications", third edition, PHI publication, pp. 1-350.
Sierk, et al. (2010) "Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments", Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908 USA, pp. 2-15, vol: 146.
Soni, S. Tang, Z. and Yang, J. (2000) "Performance Study of Microsoft Data Mining Algorithms", Microsoft White Paper pages 10.
Yehuda, L. (2006), "Data Mining and Privacy Preserving", American Association for Artificial Intelligence. Vol. 32, pp 43-54.
Zhang Y, Skolnick J. (2005), "The protein structure prediction problem could be solved using the current PDB library". Proc Natl Acad Sci USA 102: 1029–34.
Brick, K. et. al (2008) "A novel series of compositionally biased substitution matrices for comparing plasmodium proteins", Department of Infectious, Parasitic and Immune-Mediated - National Institute of Health, Viale Regina Elena, 299 0016 Roma, Italy.
Sulimova, V. et. al (2008) "Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families", DIMACS Technical Report 2008-16.
Kantardzic, M. (2000),"Data Mining: Concepts, Models, Methods, and Algorithms", John Wiley & Sons, pp 112-129.
Luscombe, N. M. , Greenbaum, D. , Gerstein, M. (2001)," What is Bioinformatics? A proposed definition and Overview of the field", Luscombe group publications, pp 346-
Merschmann, Luiz and Plastino, Alexandre (2007) "A Lazy Data Mining Approach for Protein Classification", Nanobioscience, vol. 6, issue 1, March, 2007, pp. 36-42.
Myers, Eugene and Miller, Webb "Optimal Alignments in Linear Space", Department of Computer Science, University of Arizona, Tucson, AZ 85721, NFS Grant DCR-8511455, pp. 1-13
Li, J. J and Huang, De-S. (2005), "Characterizing Human Gene Splice Sites Using Evolved Regular Expressions", Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005, pp. 493-498.

Index Terms

Computer Science

Information Sciences

Keywords

Implementing Protein