MEL-FREQUENCY CEPSTRAL COEFFICIENTS AND SPECTRUM BASED ADDITIONAL FEATURES IN AUTOMATIC SPEAKER RECOGNITION

Ivan Jokić, Stevan Jokić, Vlado Delić, Zoran Perić

DOI Number
https://doi.org/10.2298/FUEE2504663J
First page
663
Last page
680

Abstract


The efficiency of the proposed automatic speaker recognizer is evaluated using two speech databases. The feature vector consists of 21 mel-frequency cepstral coefficients (MFCCs), along with up to three additional features derived from the amplitude spectrum. The additional features are calculated based on the logarithm of the energy around the appropriate local maximum in the spectrum, the frequency of that maximum, and the logarithm of the energy of the maximum component in the spectrum across all frames of the observed signal. The speaker identification procedure for a closed set of speakers is tested on the Solo section of the CHAINS database and a speech database with expressed emotions, developed within the S-ADAPT project. The achieved maximum mean recognition accuracies are 97.11%, on the CHAINS database, using a feature vector of 21 MFCCs and two additional features, and 98.65% on neutral speech, as well as 98.72% on the entire database, for the S-ADAPT database, using a feature vector of 21 MFCCs.

Keywords

accuracy, audio recording, human voice, speaker recognition, spectral analysis

Full Text:

PDF

References


F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Margin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacrétaz and D. A. Reynolds, "A Tutorial on Text-Independent Speaker Verification", EURASIP J. Appl. Signal Process., vol. 2004, no. 4, pp. 430-451, Apr. 2004.

T. Kinnunen and H. Li, "An Overview of Text-Independent Speaker Recognition: From Features to Supervectors", Speech Commun., vol. 52, no. 1, pp. 12-40, Jan. 2010.

V. Tiwari, "MFCC and Its Applications in Speaker Recognition", Int. J. Emerg. Technol., vol. 1, no. 1, pp. 19-22, 2010.

A. Maurya, D. Kumar and R. K. Agarwal, "Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach", In Proceedings of the 6th International Conference on Smart Computing and Communications (ICSCC), 2017, Kurukshetra, India, in Procedia Comput. Sci., vol. 125, pp. 880-887, 2018.

T. R. Jayanthi Kumari, R. Anita, and T. Suraj Duncan, "Speaker Verification Comparison between GMM and GMM-UBM Under Limited Data Condition", J. Electr. Syst., vol. 20, no. 11s, pp. 1345-1350, 2024.

K. J. Devi, A. A. Devi, and K. Thongam, "Automatic Speaker Recognition using MFCC and Artificial Neural Network", Int. J. Innov. Technol. Explor. Eng. (IJITEE), vol. 9, no. 1S, pp. 39-42, Nov. 2019.

M. K. Nandwana, J. van Hout, M. McLaren, A. Stauffer, C. Richey, A. Lawson and M. Graciarena, "Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings", In Proceedings of the Interspeech, Hyderabad, India, 2018, pp. 1106-1110.

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, "X-Vectors: Robust DNN Embeddings for Speaker Recognition", In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 5329-5333.

B. D. Sarma and R. K. Das, "Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech", In Proceedings of the APSIPA Annual Summit and Conference, Auckland, New Zealand, 2020, pp. 610-615.

A. Wirdiani, S. N. Machetho, I. K. G. D. Putra, M. Sudarma, R. S. Hartati and H. A. Ferdian, "Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining", Int. J. Adv. Sci., Eng. Inform. Technol., vol. 14, no. 2, pp. 420-427, 2024.

S. Srivastava, G. Chaudhary and C. Shukla, "Text-Independent Speaker Recognition Using Deep Learning", In: S. Srivastava, M. Khari, R. Gonzales Crespo, G. Chaudhary, P. Arora (eds) Concepts and Real-Time Applications of Deep Learning. EAI/Springer Innovations in Communication and Computing. Cham: Springer, pp. 41-51, 2021.

N. Simić, S. Suzić, T. Nosek, M. Vujović, Z. Perić, M. Savić and V. Delić, "Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech", Entropy, vol. 24, no. 3, p. 414, 2022.

Y. Lukic, C. Vogt, O. Dürr, T. Stadelmann, "Speaker Identification and Clustering Using Convolutional Neural Networks", In Proceedings of the 2016 IEEE International Workshop on Machine Learning for Signal Processing, Salerno, Italy, 2016, pp. 1-6.

S. Bunrit, T. Inkian, N. Kerdprasop and K. Kerdprasop, "Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network", Int. J. Mach. Learn. Comput., vol. 9, no. 2, pp. 143-148, Apr. 2019.

G. D. Saxena, N. A. Farooqui and S. Ali, "Extricate Features Utilizing Mel Frequency Cepstral Coefficient in Automatic Speech Recognition System", Int. J. Eng. Manuf., vol. 12, no. 6, pp. 14-21, 2022.

A. Al-Qaisi, "Arabic Word Dependent Speaker Identification System Using Artificial Neural Network", Int. J. Circuits, Syst. Signal Process., vol. 14, pp. 290-295, 2020.

Latha, "Robust Speaker Identification Incorporating High Frequency Features", In Proceedings of the Twelfth International Multi-Conference on Information Processing (IMCIP-2016), in Procedia Computer Science, vol. 89, pp. 804-811, 2016.

N. P. H. Thian, C. Sanderson and S. Bengio, "Spectral Subband Centroids as Complementary Features for Speaker Authentication", In Proceedings of the First International Conference on Biometric Authentication (ICBA 2004), Hong Kong, China, July 15-17, 2004, in: D. Zhang, A. K. Jain (eds) Biometric Authentication, ICBA 2004, Lecture Notes in Computer Science, vol. 3072. Heidelberg, Berlin: Springer, 2004, pp. 631-639.

J. M. K. Kua, T. Thiruvaran, M. Nosratighods, E. Ambikairajah and J. Epps, "Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition", in Proceedings of the Odyssey-2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, 2010, pp. 34-39.

M. Qarachorloo and G. Farahani, "New Features to Improve Speaker Recognition Efficiency with Using LPCC and SSC Features", Int. J. Signal Process. Syst., vol. 4, no. 4, pp. 295-299, Aug. 2016.

T. Kinnunen, B. Zhang, J. Zhu and Y. Wang, "Speaker Verification with Adaptive Spectral Subband Centroids", In Proceedings of the International Conference on Biometrics (ICB 2007), Seoul, Korea, 2007, in: SW. Lee, S.Z. Li (eds) Advances in Biometrics, ICB 2007, Lecture Notes in Computer Science, vol. 4642. Heidelberg, Berlin: Springer, 2007, pp. 58-66.

A. Nicolson, J. Hanson, J. Lyons and K. Paliwal, "Spectral Subband Centroids for Robust Speaker Identification Using Marginalization-based Missing Feature Theory", Int. J. Signal Process. Syst., vol. 6, no. 1, pp. 12-16, Mar. 2018.

S. V Chougule and M. S Chavan, "Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition", In Proceedings of the Second International Symposium on Computer Vision and the Internet (VisionNet’15), in Procedia Computer Science, vol. 58, pp. 272-279, 2015.

W.-S. Chen and J.-F. Huang, "Speaker Recognition with Spectral Dimension Features of Human Voices for Personal Authentication", J. Netw. Commun. Emerg. Technol. (JNCET), vol. 5, no 3, pp. 6-11, 2015.

T. Thiruvaran, E. Ambikairajah and J. Epps, "Speaker Identification Using FM Features", In Proceedings of the 11th Australian International Conference on Speech Science & Technology, ed. P. Warren & C. I. Watson, ISBN 0 9581946 2 9, University of Auckland, New Zealand, 2006, pp. 148-152.

T. Thiruvaran, E. Ambikairajah and J. Epps, "FM Features for Automatic Forensic Speaker Recognition", In Proceedings of the Interspeech 2008, Interspeech 2008 Special Session: Forensic Speaker Recognition – Traditional and Automatic Approaches, Brisbane, Queensland, Australia, 2008, pp. 1497-1500.

A. Antony and R. Gopikakumari, "Speaker Identification Based on Combination of MFCC and UMRT Based Features", In Proceedings of the 8th International Conference on Advances in Computing and Communication (ICACC-2018), in Procedia Computer Science, vol. 143, pp. 250-257, 2018.

X. Liu, M. Sahidullah and T. Kinnunen, "A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings", In Proceedings of the Interspeech 2020, 2020, Shanghai, China, pp. 3221-3225.

N. Chauhan, T. Isshiki and D. Li, "Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies", Acoustics, vol. 6, pp. 439-469, 2024.

I. D. Jokić, S. D. Jokić, V. D. Delić and Z. H. Perić, "One Solution of Extension of Mel-Frequency Cepstral Coefficients Feature Vector for Automatic Speaker Recognition", Inf. Technol. Control, vol. 49, no. 2, pp. 224-236, 2020.

I. Jokić, V. Delić and Z. Perić, "Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities", Alfatech J., no. 1, pp. 5-10, 2025.

M. Sigmund, "Speaker Discrimination Using Long-Term Spectrum of Speech", J. Inf. Technol. Control, vol. 48, no. 3, pp. 446-453, 2019.

N. N. An, N. Q. Thanh and Y. Liu, "Deep CNNs with Self-Attention for Speaker Identification", IEEE Access, vol.7, pp. 85327-85337, May 2019.

[Online]. Available: https://github.com/stevanjokic/speaker_identification

F. Cummins, M. Grimaldi, T. Leonard, J. Simko, "The CHAINS Corpus: CHAracterizing INdividual Speakers", In Proceedings of the 11th International Conference “Speech and Computer” SPECOM’2006, St. Petersburg, Russia, 2006, pp. 431-435.


Refbacks

  • There are currently no refbacks.


ISSN: 0353-3670 (Print)

ISSN: 2217-5997 (Online)

COBISS.SR-ID 12826626