SVM-BASED EMOTION RECOGNITION FROM SPEECH WITH GTCC AND FREQUENCY FEATURES

Dragan Veljković; Dejan Rančić

doi:10.22190/FUACR250210003V

SVM-BASED EMOTION RECOGNITION FROM SPEECH WITH GTCC AND FREQUENCY FEATURES

Dragan Veljković, Dejan Rančić

DOI Number

https://doi.org/10.22190/FUACR250210003V

First page

017

Last page

034

Abstract

When a person is in a certain emotional state, a large number of physiological changes occur in the body. These changes significantly affect the way words are pronounced compared to neutral speech. This means that the configuration of the vocal tract changes depending on the speaker’s emotional state. Furthermore, in emotional speech, physiological changes influence certain speech properties, such as speech rate, intensity, and pitchSuccessful classification of emotional speech into the appropriate emotion class requires extraction of salient speech features and construction of a feature vector composed of discriminative attributes that facilitate accurate classification. In this study, we use Gammatone Cepstral Coefficients (GTCC) as components of the feature vector for speech emotion recognition. GTCC are a biologically inspired modification of Mel-Frequency Cepstral Coefficients (MFCC). They are based on gammatone filters, which simulate the human auditory system more effectively than the mel-frequency filters used in MFCCThe remainder of the feature vector is composed of spectral characteristics extracted from the speech signal. In our classification model, the components of the feature vector are primarily extracted by performing spectral analysis on short-time frames of the observed speech signal. Feature vectors constitute discriminative representations that facilitate the more effective classification of speech into corresponding emotional categories. Our classifier is based on Support Vector Machines (SVM), with optimized hyper-parameters.

Keywords

Speech emotion classification, gammatone filter bank, GTCC, SVM

Full Text:

PDF

References

Minu Babu et all “Whether MFCC or GFCC is better for recognizing emotion from speech?”, International journal of research in computer applications and robotics Vol.2 Issue.6, Pg.: 14-17, June 2014, www.ijrcar.com

Holdsworth J, Smith I,N, Patterson R, Rice P. Implementing a Gammatone filter bank. Annex C of the SVOS final report: Part A: The auditory filterbank. 1988. p. 1–5. Available from: https://www.pdn.cam.ac.uk/other-pages/cnbh/files/publications/SVOSAnnexC1988.pdf

O. Cheng, W. Abdulla, Z. Salcic, and N. Zealand, “Performance Evaluation of Front-End Algorithms for Robust Speech Recognition,” Signal Processing and Its Applications, 2005. Proceedings of the Eighth International Symposium on, vol. 2, pp. 711–714, 2005.

R. Schl, I. Bezrukov, H. Wagner, and H. Ney, “Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition,” Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 4, pp. 649–652, 2007.

X. Valero, S. Member, and F. Alías, “Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification,” Multimedia, IEEE Transactions on, vol. 14, no. 6, pp. 1684–1689, 2012.

M. Slaney, “An Efficient Implementation of the Auditory Filter Bank,” Apple Computer, Perception Group, Tech. Rep, 1993.

B. R. Glasberg and B. C. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing research, vol. 47, no. 1, pp. 103–138, Aug. 1990.

Utane, Akshay S., and S. L. Nalbalwar. "Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine."emotion 2 (2013): 8.

P. Boersma, "Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound,", IFA Proceedings, (17), 1993.

Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer, „Emotion dimensions and formant position“, https://bridging.uvt.nl/pdf/goudbeek_goldman_scherer_interspeech_2009.pdf

Ververidis, Dimitrios, and Constantine Kotropoulos. "Emotional speech recognition: Resources, features, and methods." Speech communication 48.9 (2006): 1162-1181.

Huang Y, Ao W, Zhang G (2017) Novel sub-band spectral centroid weighted wavelet packet features with importance-weighted support vector machines for robust speech emotion recognition. Wireless Pers Commun 95(3):2223–2238.

Mao Q, Xu G, Xue W, Gou J, Zhan Y (2017) Learning emotiondiscriminative and domain-invariant features for domain adaptation in speech emotion recognition. Speech Commun 93:1–10

Jordan, M.: The Kernel Trick, Advanced Topics in Learning & Decision Making, Berkeley, 2004.

Minh, H.; Q.; Niyogi, P.; Yao, Y.: Mercer Theorem, Feature Maps, and Smoothing, Lecture Notes in Computer Science,Springer Berlin, 2006

Cortes, C.; Vapnik, V.: Support Vector Networks, Machine Learning, vol.20, pp. 273-297, Kluver Academic Publishers, Boston, 1995.

Pan, Yixiong, Peipei Shen, and Liping Shen. "Speech emotion recognition using support vector machine." International Journal ofSmart Home 6.2 (2012): 101-108.

D. Ververidis, C. Kotropoulos, I. Pitas, Automatic emotional speech classification, In. Proc. 2004 IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 593-596, Montreal, 2004.

Pan, Yixiong, Peipei Shen, and Liping Shen. "Speech emotion recognition using support vector machine." International Journal ofSmart Home 6.2 (2012): 101-108.

Prabhakar GA, Basel B, Dutta A, Rao CVR (2023) Multichannel cnn-blstm architecture for speech emotion recognition system by fusion of magnitude and phase spectral features using DCCA for consumer appli¬cations. IEEE Transactions on consumer electronics

Hama Saeed M (2023) Improved speech emotion classification using deep neural network. Circuits Syst Signal Proc 42(12):7357–7376

Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS (2023) Speech emotion recognition through hybrid features and convolutional neural network. Appl Sci 13(8):4750

Manuel Cardona, Vijender K. Solanki, Speech emotion recognition using gammatone cepstral coefficients and deep learning features, Proceedings of the 2023 IEEE International Conference on Machine Learning and Applied Network Technologies

U. Kumaran, S. Radha Rammohan, Senthil Murugan Nagarajan, A. Prathik, Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN, June 2021 International Journal of Speech Technology 24(2), Volume 24, pages 303–314, (2021)

DOI: https://doi.org/10.22190/FUACR250210003V

Refbacks

There are currently no refbacks.

Print ISSN: 1820-6417
Online ISSN: 1820-6425

Username
Password
Remember me