CFP last date
20 May 2024
Reseach Article

Implementing Protein Sequence Alignment using PAM-250 Matrices

by Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 11
Year of Publication: 2014
Authors: Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur
10.5120/16386-5943

Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur . Implementing Protein Sequence Alignment using PAM-250 Matrices. International Journal of Computer Applications. 94, 11 ( May 2014), 11-16. DOI=10.5120/16386-5943

@article{ 10.5120/16386-5943,
author = { Rajbir Singh, Sukhleen Kaur Bhasina, Dheeraj Pal Kaur },
title = { Implementing Protein Sequence Alignment using PAM-250 Matrices },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 11 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 11-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number11/16386-5943/ },
doi = { 10.5120/16386-5943 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:17:21.511697+05:30
%A Rajbir Singh
%A Sukhleen Kaur Bhasina
%A Dheeraj Pal Kaur
%T Implementing Protein Sequence Alignment using PAM-250 Matrices
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 11
%P 11-16
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Multiple Protein Sequence is one of the most important problems in modern computational biology. The emphasis here is on the use of computers because most of the tasks involved in genomic data analysis are highly repetitive or mathematically complex. One of the largest areas of Bioinformatics and Data mining has been in the Protein Domain. These efforts have included protein Structure prediction, folding Pathway prediction, Sequence alignment, Substructure Detection and many others. Data storage became easier as the accessibility of large amount of computing power at low cost. The research in bioinformatics has accumulated large amount of data. As the hardware technology advancing, the cost of storing is decreasing. The biological data is available in different formats and is comparatively more complex. In the present work, data mining solution is provided for the problem of protein sequence alignment. Different formats of sequences are studied and plain text format is chosen for the problem under consideration. Clustering methods are based on expressing similarity or dissimilarity of such sequences. The similarity of two protein sequences can be assessed by score of the best alignment of the sequences. Scoring matrix accesses the replacement of one amino acid by another, accepted by natural selection. The replacement can be due the result of two distinct processes: i) occurrence of mutation in the portion of the gene template producing one amino acid of a protein. ii) acceptance of the mutation by the species (similar function). PAM (Accepted Point Mutations) is the scoring matrice that is used for the different computations. PAM-250 matrix is used for the problem under consideration. The matrix is frequently used to score aligned peptide sequences to determine the similarity of those sequences. The numbers given above were derived from comparing aligned sequences of proteins with known homology and determining the "accepted point mutations" (PAM) observed. Global and Local alignments are predicted along with the alignment score.

References
  1. Bowman, M. , Debray, S. K. , and Peterson, L. L. 1993. Reasoning about naming systems. .
  2. Pei, J. and Jiang, D. (2005), "An Interactive Approach to Mining Gene Expression Data", IEEE Transactions on knowledge and Data Engineering, vol. 17, pp. 1363-1378.
  3. Rastogi, S. C. , Mendiratta, N. and Rastogi, P. (2005) "Bioinformatics Methods and Applications", third edition, PHI publication, pp. 1-350.
  4. Sierk, et al. (2010) "Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments", Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908 USA, pp. 2-15, vol: 146.
  5. Soni, S. Tang, Z. and Yang, J. (2000) "Performance Study of Microsoft Data Mining Algorithms", Microsoft White Paper pages 10.
  6. Yehuda, L. (2006), "Data Mining and Privacy Preserving", American Association for Artificial Intelligence. Vol. 32, pp 43-54.
  7. Zhang Y, Skolnick J. (2005), "The protein structure prediction problem could be solved using the current PDB library". Proc Natl Acad Sci USA 102: 1029–34.
  8. Brick, K. et. al (2008) "A novel series of compositionally biased substitution matrices for comparing plasmodium proteins", Department of Infectious, Parasitic and Immune-Mediated - National Institute of Health, Viale Regina Elena, 299 0016 Roma, Italy.
  9. Sulimova, V. et. al (2008) "Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families", DIMACS Technical Report 2008-16.
  10. Kantardzic, M. (2000),"Data Mining: Concepts, Models, Methods, and Algorithms", John Wiley & Sons, pp 112-129.
  11. Luscombe, N. M. , Greenbaum, D. , Gerstein, M. (2001)," What is Bioinformatics? A proposed definition and Overview of the field", Luscombe group publications, pp 346-
  12. Merschmann, Luiz and Plastino, Alexandre (2007) "A Lazy Data Mining Approach for Protein Classification", Nanobioscience, vol. 6, issue 1, March, 2007, pp. 36-42.
  13. Myers, Eugene and Miller, Webb "Optimal Alignments in Linear Space", Department of Computer Science, University of Arizona, Tucson, AZ 85721, NFS Grant DCR-8511455, pp. 1-13
  14. Li, J. J and Huang, De-S. (2005), "Characterizing Human Gene Splice Sites Using Evolved Regular Expressions", Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005, pp. 493-498.
Index Terms

Computer Science
Information Sciences

Keywords

Implementing Protein