Call for Paper - March 2023 Edition
IJCA solicits original research papers for the March 2023 Edition. Last date of manuscript submission is February 20, 2023. Read More

Phylogenetic Tree Generation using Different Scoring Methods

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 100 - Number 14
Year of Publication: 2014
Rajbir Singh
Sinapreet Kaur
Dheeraj Pal Kaur

Rajbir Singh, Sinapreet Kaur and Dheeraj Pal Kaur. Article: Phylogenetic Tree Generation using Different Scoring Methods. International Journal of Computer Applications 100(14):38-45, August 2014. Full text available. BibTeX

	author = {Rajbir Singh and Sinapreet Kaur and Dheeraj Pal Kaur},
	title = {Article: Phylogenetic Tree Generation using Different Scoring Methods},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {100},
	number = {14},
	pages = {38-45},
	month = {August},
	note = {Full text available}


Data Mining is a branch of knowledge discovery in the field of research and development. The biological data is available in different formats and is comparatively more complex. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of biology. Hierarchical Clustering is the one of the main techniques for data mining. Phylogeny is the evolutionary history for a set of evolutionary related species. One approach on determining the evolutionary histories of a dataset are scoring based methods. There are number of different distance based methods of which two are details with here: the UPGMA (Unweighted Pair Group Method using Arithmetic average) and Neighbor Joining. A method for construction of distance based phylogenetic tree using hierarchical clustering is proposed and implemented on different rice varieties. The sequences are downloaded from NCBI databank. Evolutionary distances are calculated using jukes cantor distance method. Multiple sequence alignment is applied on different datasets. Trees are constructed for different datasets from available data using both the distance based methods and pruning technique. SNAP calculates synonymous and non-synonymous substitution rates based on a set of codon aligned nucleotide sequences. The DNA Multiple sequences to calculate the GC content of eukaryotes, molecular weight, melting temperature and tree information. Extractions of closely related varieties are performed by applying threshold condition. Then, final tree is constructed using these closely related rice varieties.


  • Amanda J. Garris,(2005) "Genetic Structure and Diversity in Oryza sativa L. ", Oxford Journals, pp. 1631-1638.
  • Archak S. and Nagaraju J. , (2007) "Computational Prediction of Rice (Oryza sativa) miRNA Targets", Genomics Proteomics & Bioinformatics, Vol. 5 No. 3–4, pp. 196-206.
  • Arthur M. , (2002) "Introduction to bioinformatics", oxford university press, pp. 25-28
  • Bergeron, B. (2003) "Bioinformatics Computing", Pearson Education, pp. 110-160.
  • David J. HAND, (1998) "Data Mining: Statistics and More? ", The American Statistician, Vol. 52, No. 2, pp. 112-118.
  • Gronau I. and Moran S. , (2007) "Optimal Implementations of UPGMA and Other Common Clustering Algorithms", Information Processing Letters, Volume 104, Issue 6, pp. 205-210.
  • Jacques Cohen (2004) "Bioinformatics An Introduction for Computer Scientists", ACM Computing Surveys, Vol. 36, No. 2, pp. 122–158.
  • Jose C. Clemente et al. , (2006) "Phylogenetic reconstruction from non-genomic data" Oxford University Press, Vol. 23, pp. e110–e115.
  • Khalid R. (2012) "Application of Data Mining in Bioinformatics", Indian Journal of Computer Science and Engineering, Vol. 1 No 2, pp. 114-118.
  • Mai S. Mabrouk et al. (2006) "BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab", proc. cairo international biomedical engineering conference, pp. 1-9.
  • Nair Achuthsankar S. , "Computational Biology & Bioinformatics: A Gentle Overview", Communications of the Computer Society of India, January 2007.
  • Rakshit S. et al. , (2007) "Large-scale DNA polymorphism study of Oryza sativa and O. rufipogon reveals the origin an divergence of Asian rice", Springer, pp. 731-743.
  • Rani S. and Kaur S. (2012) "Cluster Analysis Method for Multiple Sequence Alignment", International Journal of Computer Applications, Vol. 43– No. 14, pp. 19-25
  • Singh Harmandeep (2013) "Implementing Hierarchical Clustering method For Multiple Sequence Alignment and Phylogenetic Tree Construction", International Journal of Computer Science, Engineering and Information Technology, Vol. 3, No. 1, pp. 1-12. .
  • Usama Fayyad et al. , (1996) "From Data Mining to Knowledge Discovery in Databases", American Association for Artificial Intelligence, Volume 17 Number 3, pp. 37-54.