Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram

Prachi J. Natu; Shachi J. Natu; Dr. T. K. Sarode; Dr. H. B. Kekre

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

MAYO Index for Deep Analytics of Price and Performance of IPL Players

Sep

2016

Effective Geographical Routing in the Presence of Unpredictable Node Mobility

July

2013

Hierarchical Coding Structure for Video Coding and its Applicability in Scalable Video Coding

August

2013

Influence of Chemical Reaction, Magnetic Field and Radiation on Heat and Mass Transfer by Free Convection Flow near the Lower Stagnation Point of an Isothermal Horizontal Circular Cylinder in a Porous Medium Considering Soret and Dufour Effects

April

2015

Reseach Article

Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram

by Prachi J. Natu, Shachi J. Natu, Dr. T. K. Sarode, Dr. H. B. Kekre

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 5 - Number 6

Year of Publication: 2010

Authors: Prachi J. Natu, Shachi J. Natu, Dr. T. K. Sarode, Dr. H. B. Kekre

10.5120/916-1294

Prachi J. Natu, Shachi J. Natu, Dr. T. K. Sarode, Dr. H. B. Kekre . Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram. International Journal of Computer Applications. 5, 6 ( August 2010), 30-37. DOI=10.5120/916-1294

@article{ 10.5120/916-1294,

author = { Prachi J. Natu, Shachi J. Natu, Dr. T. K. Sarode, Dr. H. B. Kekre },

title = { Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram },

journal = { International Journal of Computer Applications },

issue_date = { August 2010 },

volume = { 5 },

number = { 6 },

month = { August },

year = { 2010 },

issn = { 0975-8887 },

pages = { 30-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume5/number6/916-1294/ },

doi = { 10.5120/916-1294 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:53:35.215336+05:30

%A Prachi J. Natu

%A Shachi J. Natu

%A Dr. T. K. Sarode

%A Dr. H. B. Kekre

%T Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram

%J International Journal of Computer Applications

%@ 0975-8887

%V 5

%N 6

%P 30-37

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper aims to provide different approaches to text dependent speaker identification using various transformation techniques such as DCT, Walsh and Haar transform along with use of spectrograms. Set of spectrograms obtained from speech samples is used as image database for the study undertaken. This image database is then subjected to various transforms. Using Euclidean distance as measure of similarity, most appropriate speaker match is obtained which is declared to be identified speaker. Each transform is applied to spectrograms in two different ways: on full image and on Row Mean of an image. In both the ways, effect of different number of coefficients of transformed image is observed. Further, comparison of all three transformation techniques on spectrograms in both the ways shows that numbers of mathematical computations required for Walsh transform is much lesser than number of mathematical computations required in case of DCT on spectrograms. Whereas, use of Haar transform on spectrograms drastically reduces the number of mathematical computation with almost equal identification rate. Transformation techniques on Row Mean give better identification rate than transformation technique on full image.

References

Evgeniy Gabrilovich, Alberto D. Berstin: “Speaker recognition: using a vector quantization approach for robust text-independent speaker identification”, Technical report DSPG-95-9-001’, September 1995.
Tridibesh Dutta, “Text dependent speaker identification based on spectrograms”, Proceedings of Image and vision computing, pp. 238-243, New Zealand 2007.
J.P.Campbell, “Speaker recognition: a tutorial”, Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, 1997.
D. O’Shaughnessy, “Speech communications- Man and Machine”, New York, IEEE Press, 2nd Ed., pp. 199, pp. 437-458, 2000.
H.B.Kekre, Sudeep D. Thepade, “Improving the Performance of Image Retrieval using Partial Coefficients of Transformed Image”, International Journal of Information Retrieval (IJIR), Serials Publications, Volume 2, Issue 1, pp. 72-79 (ISSN: 0974-6285), 2009.
H.B.Kekre, Tanuja Sarode, Sudeep D. Thepade, “DCT Applied to Row Mean and Column Vectors in Fingerprint Identification”, In Proceedings of International Conference on Computer Networks and Security (ICCNS), 27-28 Sept. 2008, VIT, Pune.
H.B.Kekre, Sudeep D. Thepade, Archana Athawale, Anant Shah, Prathmesh Verlekar, Suraj Shirke,“Energy Compaction and Image Splitting for Image Retrieval using Kekre’s Transform over Row and Column Feature Vectors”, International Journal of Computer Science and Network Security (IJCSNS),Volume:10, Number 1, January 2010, (ISSN: 1738-7906) Available at www.IJCSNS.org.
H.B.Kekre, Sudeep D. Thepade, Archana Athawale, Anant Shah, Prathmesh Verlekar, Suraj Shirke, “Performance Evaluation of Image Retrieval using Energy Compaction and Image Tiling over DCT Row Mean and DCT Column Mean”, Springer-International Conference on Contours of Computing Technology (Thinkquest-2010), Babasaheb Gawde Institute of Technology, Mumbai, 13-14 March 2010, The paper will be uploaded on online Springerlink.
S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transaction Acoustics Speech and Signal Processing, vol. 4, pp. 375-366, 1980.
Wang Yutai, Li Bo, Jiang Xiaoqing, Liu Feng, Wang Lihao, “Speaker Recognition Based on Dynamic MFCC Parameters”, International Conference on Image Analysis and Signal Processing, pp. 406-409, 2009
Azzam Sleit, Sami Serhan, and Loai Nemir, “A histogram based speaker identification technique”, International Conference on ICADIWT, pp. 384-388, May 2008.
B. S. Atal, “Automatic Recognition of speakers from their voices”, Proc. IEEE, vol. 64, pp. 460-475, 1976.
H. B. Kekre, Tanuja K. Sarode, Sudeep D. Thepade, “Image Retrieval by Kekre’s Transform Applied on Each Row of Walsh Transformed VQ Codebook”, (Invited), ACM-International Conference and Workshop on Emerging Trends in Technology (ICWET 2010), Thakur College of Engg. And Tech., Mumbai, 26-27 Feb 2010, the paper is invited at ICWET 2010. Also will be uploaded on online ACM Portal.
H. B. Kekre, Tanuja Sarode, Sudeep D. Thepade, “Color-Texture Feature based Image Retrieval using DCT applied on Kekre’s Median Codebook”, International Journal on Imaging (IJI), Volume 2, Number A09, Autumn 2009,pp. 55-65. Available online at www.ceser.res.in/iji.html (ISSN: 0974-0627).
H. B. Kekre, Ms. Tanuja K. Sarode, Sudeep Thepade, "Image Retrieval using Color-Texture Features from DCT on VQ Codevectors obtained by Kekre’s Fast Codebook Generation", ICGST-International Journal on Graphics, Vision and Image Processing (GVIP), Volume 9, Issue 5, pp.: 1-8, September 2009. Available online at http: //www.icgst.com/gvip/Volume9/Issue5/P1150921752.html.
H. B. Kekre, Tanuja Sarode “Two Level Vector Quantization Method for Codebook Generation using Kekre’s Proportionate Error Algorithm” , CSC-International Journal of Image Processing, Vol.4, Issue 1, pp.1-10, January-February 2010
H.B.Kekre, Tanuja K. Sarode, Sudeep D. Thepade, Vaishali Suryavanshi, “Improved Texture Feature Based Image Retrieval using Kekre’s Fast Codebook Generation Algorithm”, Springer-International Conference on Contours of Computing Technology (Thinkquest-2010), Babasaheb Gawde Institute of Technology, Mumbai, 13-14 March 2010, The paper will be uploaded on online Springerlink.
Jialong He, Li Liu, and G¨unther Palm, “A discriminative training algorithm for VQ-based speaker Identification”, IEEE Transactions on speech and audio processing, vol. 7, No. 3, pp. 353-356, May 1999.
Debadatta Pati, S. R. Mahadeva Prasanna, “Non-Parametric Vector Quantization of Excitation Source Information for Speaker Recognition”, IEEE Region 10 Conference, pp. 1-4, Nov. 2008.
Tridibesh Dutta and Gopal K. Basak, “Text dependent speaker identification using similar patterns in spectrograms”, PRIP'2007 Proceedings, Volume 1, pp. 87-92, Minsk, 2007.
Andrew B. Watson, “Image compression using the Discrete Cosine Transform”, Mathematica journal, 4(1), pp. 81-88, 1994.
H. B. Kekre, Sudeep Thepade, Akshay Maloo, “Image Retrieval using Fractional Coefficients of Transformed Image using DCT and Walsh Transform”, International Journal of Engineering Science and Technology, Vol.. 2, No. 4, 2010, 362-371
H. B. Kekre, Sudeep Thepade and Akshay Maloo, “Performance Comparison of Image Retrieval Using Fractional Coefficients of Transformed Image Using DCT, Walsh, Haar and Kekre’s Transform”, CSC-International Journal of Image processing (IJIP), Vol.. 4, No.2, pp.:142-155, May 2010.
H. B. Kekre, Tanuja Sarode, Shachi Natu, Prachi Natu, “Performance Comparison Of 2-D DCT On Full/Block Spectrogram And 1-D DCT On Row Mean Of Spectrogram For Speaker Identification”, (Selected) CSC-International Journal of Biometrics and Bioinformatics (IJBB), Volume (4): Issue (3).
H. B. Kekre, Sudeep Thepade, Akshay Maloo, “Eigenvectors of Covariance Matrix using Row Mean and Column Mean Sequences for Face Recognition”, CSC-International Journal of Biometrics and Bioinformatics (IJBB), Volume (4): Issue (2), pp. 42-50, May 2010.
http://www.itee.uq.edu.au/~conrad/vidtimit/
http://www2.imm.dtu.dk/~lf/elsdsr/

Index Terms

Computer Science

Information Sciences

Keywords

Speaker identification Speaker Recognition Spectrograms DCT WALSH HAAR Row Mean