Advanced Deep Learning Models For Emotion Detection In Speech: Applying The Ravdess Dataset

Gagah Dwiki Putra Aryono; Dede Ferawati; Sigit Auliana

doi:10.30645/jurasik.v9i2.815

Advanced Deep Learning Models For Emotion Detection In Speech: Applying The Ravdess Dataset

Gagah Dwiki Putra Aryono^(1*), Dede Ferawati⁽²⁾, Sigit Auliana⁽³⁾,

(1) Universitas Bina Bangsa, Indonesia
(2) Universitas Bina Bangsa, Indonesia
(3) Universitas Bina Bangsa, Indonesia
(*) Corresponding Author

Abstract

This study introduces a comprehensive approach to emotion recognition in speech using the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). The method integrates several state-of-the-art deep learning models known for their proficiency in pattern recognition and audio processing. The RAVDESS dataset comprises diverse audio files featuring emotional expressions by professional actors, meticulously categorized by modality, emotion, intensity, and other attributes. These data are utilized to train and evaluate various deep learning architectures including AlexNet, ResNet, InceptionNet, VGG16, and VGG19, as well as recurrent neural network (RNN) models such as LSTM and the latest transformer models. The analysis results indicate that the Transformer model excels with higher accuracy, precision, recall, and F1 score in emotion classification tasks compared to other models. This study not only enhances understanding of subtle emotional nuances in spoken language but also establishes new benchmarks in applying diverse neural network types for emotion recognition from audio. By providing detailed comparisons among models, this research advances the technology of emotion recognition, enhancing its applications in human-computer interaction, psychotherapy, entertainment industry, and paving the way for further development in multimodal emotion recognition systems.

Full Text:

PDF

References

B. J. Abbaschian, D. Sierra-Sosa, and A. Elmaghraby, “Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models,” Sensors, vol. 21, no. 4, p. 1249, Feb. 2021, doi: 10.3390/s21041249.

R. Jahangir, Y. W. Teh, F. Hanif, and G. Mujtaba, “Deep learning approaches for speech emotion recognition: state of the art and research challenges,” Multimed. Tools Appl., vol. 80, no. 16, pp. 23745–23812, Jul. 2021, doi: 10.1007/s11042-020-09874-7.

S. Fan, B. L. Koenig, Q. Zhao, and M. S. Kankanhalli, “A Deeper Look at Human Visual Perception of Images,” SN Comput. Sci., vol. 1, no. 1, p. 58, Jan. 2020, doi: 10.1007/s42979-019-0061-5.

G. A.V., M. T., P. D., and U. E., “Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions,” Inf. Fusion, vol. 105, p. 102218, May 2024, doi: 10.1016/j.inffus.2023.102218.

S. Qi, Z. Cao, J. Rao, L. Wang, J. Xiao, and X. Wang, “What is the limitation of multimodal LLMs? A deeper look into multimodal LLMs through prompt probing,” Inf. Process. Manag., vol. 60, no. 6, p. 103510, Nov. 2023, doi: 10.1016/j.ipm.2023.103510.

S. F. Ahmed et al., “Deep learning modelling techniques: current progress, applications, advantages, and challenges,” Artif. Intell. Rev., vol. 56, no. 11, pp. 13521–13617, Nov. 2023, doi: 10.1007/s10462-023-10466-8.

Z. Amiri, A. Heidari, N. J. Navimipour, M. Unal, and A. Mousavi, “Adventures in data analysis: a systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems,” Multimed. Tools Appl., vol. 83, no. 8, pp. 22909–22973, Aug. 2023, doi: 10.1007/s11042-023-16382-x.

S. K. Khare, V. Blanes-Vidal, E. S. Nadimi, and U. R. Acharya, “Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations,” Inf. Fusion, vol. 102, p. 102019, Feb. 2024, doi: 10.1016/j.inffus.2023.102019.

I. Lauriola, A. Lavelli, and F. Aiolli, “An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools,” Neurocomputing, vol. 470, pp. 443–456, Jan. 2022, doi: 10.1016/j.neucom.2021.05.103.

I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 6, p. 420, Nov. 2021, doi: 10.1007/s42979-021-00815-1.

J. Chai, H. Zeng, A. Li, and E. W. T. Ngai, “Deep learning in computer vision: A critical review of emerging techniques and application scenarios,” Mach. Learn. with Appl., vol. 6, p. 100134, Dec. 2021, doi: 10.1016/j.mlwa.2021.100134.

P.-N. Tran, T.-D. T. Vu, D. N. M. Dang, N. T. Pham, and A.-K. Tran, “Multi-modal Speech Emotion Recognition: Improving Accuracy Through Fusion of VGGish and BERT Features with Multi-head Attention,” 2023, pp. 148–158. doi: 10.1007/978-3-031-47359-3_11.

M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, Jan. 2020, doi: 10.1016/j.specom.2019.12.001.

T. Talaei Khoei, H. Ould Slimane, and N. Kaabouch, “Deep learning: systematic review, models, challenges, and research directions,” Neural Comput. Appl., vol. 35, no. 31, pp. 23103–23124, Nov. 2023, doi: 10.1007/s00521-023-08957-4.

G. Amir, O. Maayan, T. Zelazny, G. Katz, and M. Schapira, “Verifying Generalization in Deep Learning,” 2023, pp. 438–455. doi: 10.1007/978-3-031-37703-7_21.

R. Archana and P. S. E. Jeevaraj, “Deep learning models for digital image processing: a review,” Artif. Intell. Rev., vol. 57, no. 1, p. 11, Jan. 2024, doi: 10.1007/s10462-023-10631-z.

E. M. G. Younis, S. Mohsen, E. H. Houssein, and O. A. S. Ibrahim, “Machine

learning for human emotion recognition: a comprehensive review,” Neural Comput. Appl., vol. 36, no. 16, pp. 8901–8947, Jun. 2024, doi: 10.1007/s00521-024-09426-2.

X. Wang, Y. Ren, Z. Luo, W. He, J. Hong, and Y. Huang, “Deep learning-based EEG emotion recognition: Current trends and future perspectives,” Front. Psychol., vol. 14, Feb. 2023, doi: 10.3389/fpsyg.2023.1126994.

S. Concannon and M. Tomalin, “Measuring perceived empathy in dialogue systems,” AI Soc., Jul. 2023, doi: 10.1007/s00146-023-01715-z.

J. H. Janssen, “A three-component framework for empathic technologies to augment human interaction,” J. Multimodal User Interfaces, vol. 6, no. 3–4, pp. 143–161, Nov. 2012, doi: 10.1007/s12193-012-0097-5.

Y. Ma, K. L. Nguyen, F. Z. Xing, and E. Cambria, “A survey on empathetic dialogue systems,” Inf. Fusion, vol. 64, pp. 50–70, Dec. 2020, doi: 10.1016/j.inffus.2020.06.011.

A. Thakkar, A. Gupta, and A. De Sousa, “Artificial intelligence in positive mental health: a narrative review,” Front. Digit. Heal., vol. 6, Mar. 2024, doi: 10.3389/fdgth.2024.1280235.

A. E. Wells, L. M. Hunnikin, D. P. Ash, and S. H. M. van Goozen, “Improving emotion recognition is associated with subsequent mental health and well-being in children with severe behavioural problems,” Eur. Child Adolesc. Psychiatry, vol. 30, no. 11, pp. 1769–1777, Nov. 2021, doi: 10.1007/s00787-020-01652-y.

A. Striner, S. Azad, and C. Martens, “A Spectrum of Audience Interactivity for Entertainment Domains,” 2019, pp. 214–232. doi: 10.1007/978-3-030-33894-7_23.

G. G. Hallur, S. Prabhu, and A. Aslekar, “Entertainment in Era of AI, Big Data & IoT,” in Digital Entertainment, Singapore: Springer Nature Singapore, 2021, pp. 87–109. doi: 10.1007/978-981-15-9724-4_5.

W. S. Lages, “Nine Challenges for Immersive Entertainment,” 2023, pp. 233–254. doi: 10.1007/978-3-031-27639-2_11.

S. Madanian et al., “Speech emotion recognition using machine learning — A systematic review,” Intell. Syst. with Appl., vol. 20, p. 200266, Nov. 2023, doi:

1016/j.iswa.2023.200266.

M. Liu, A. N. Joseph Raj, V. Rajangam, K. Ma, Z. Zhuang, and S. Zhuang,

“Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition,” Speech Commun., vol. 156, p. 103010, Jan. 2024, doi: 10.1016/j.specom.2023.103010.

S. M. Al-Selwi et al., “RNN-LSTM: From applications to modeling techniques and beyond—Systematic review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 5, p. 102068, Jun. 2024, doi: 10.1016/j.jksuci.2024.102068.

N. Ahmed, Z. Al Aghbari, and S. Girija, “A systematic survey on multimodal emotion recognition using learning algorithms,” Intell. Syst. with Appl., vol. 17, p. 200171, Feb. 2023, doi: 10.1016/j.iswa.2022.200171.

R. Malhotra and P. Singh, “Recent advances in deep learning models: a systematic literature review,” Multimed. Tools Appl., vol. 82, no. 29, pp. 44977–45060, Dec. 2023, doi: 10.1007/s11042-023-15295-z.

A. Soliman, S. Shaheen, and M. Hadhoud, “Leveraging pre-trained language models for code generation,” Complex Intell. Syst., vol. 10, no. 3, pp. 3955–3980, Jun. 2024, doi: 10.1007/s40747-024-01373-8.

X. Han et al., “Pre-trained models: Past, present and future,” AI Open, vol. 2, pp. 225–250, 2021, doi: 10.1016/j.aiopen.2021.08.002.

A. de Santana Correia and E. L. Colombini, “Attention, please! A survey of neural attention models in deep learning,” Artif. Intell. Rev., vol. 55, no. 8, pp. 6037–6124, Dec. 2022, doi: 10.1007/s10462-022-10148-x.

M. M. Taye, “Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions,” Computers, vol. 12, no. 5, p. 91, Apr. 2023, doi: 10.3390/computers12050091.

S. Razavi, “Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling,” Environ. Model. Softw., vol. 144, p. 105159, Oct. 2021, doi: 10.1016/j.envsoft.2021.105159.

C. Luna-Jiménez, D. Griol, Z. Callejas, R. Kleinlein, J. M. Montero, and F. Fernández-Martínez, “Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning,” Sensors, vol. 21, no. 22, p. 7665, Nov. 2021, doi: 10.3390/s21227665.

V. Gupta, S. Juyal, and Y.-C. Hu, “Understanding human emotions through speech spectrograms using deep neural network,” J. Supercomput., vol. 78, no. 5, pp. 6944–6973, Apr. 2022, doi: 10.1007/s11227-021-04124-5.

A. Crespo-Michel, M. A. Alonso-Arévalo, and R. Hernández-Martínez, “Developing a microscope image dataset for fungal spore classification in grapevine using deep learning,” J. Agric. Food Res., vol. 14, p. 100805, Dec. 2023, doi: 10.1016/j.jafr.2023.100805.

I. D. Dinov, “Model Performance Assessment, Validation, and Improvement,” 2023, pp. 477–531. doi: 10.1007/978-3-031-17483-4_9.

S. Sultan and Y. Bekeneva, “A Comparative Analysis of a Designed CNN and AlexNet for Image Classification on Small Datasets,” 2022, pp. 441–446. doi: 10.1007/978-3-030-96627-0_40.

B. Mahaur, K. K. Mishra, and N. Singh, “Improved Residual Network based on norm-preservation for visual recognition,” Neural Networks, vol. 157, pp. 305–322, Jan. 2023, doi: 10.1016/j.neunet.2022.10.023.

A. H. M. Linkon, M. M. Labib, T. Hasan, M. Hossain, and M.-E.- Jannat, “Deep learning in prostate cancer diagnosis and Gleason grading in histopathology images: An extensive study,” Informatics Med. Unlocked, vol. 24, p. 100582, 2021, doi: 10.1016/j.imu.2021.100582.

K. Kansal and S. Sharma, “Predictive Deep Learning: An Analysis of Inception V3, VGG16, and VGG19 Models for Breast Cancer Detection,” 2024, pp. 347–357. doi: 10.1007/978-3-031-56703-2_28.

F. M. Salem, “Gated RNN: The Long Short-Term Memory (LSTM) RNN,” in Recurrent Neural Networks, Cham: Springer International Publishing, 2022, pp. 71–82. doi: 10.1007/978-3-030-89929-5_4.

S. Islam et al., “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Syst. Appl., vol. 241, p. 122666, May 2024, doi: 10.1016/j.eswa.2023.122666.

G. Varoquaux and O. Colliot, “Evaluating Machine Learning Models and Their Diagnostic Value,” 2023, pp. 601–630. doi: 10.1007/978-1-0716-3195-9_20.

O. A. Montesinos López, A. Montesinos López, and J. Crossa, “Overfitting, Model Tuning, and Evaluation of Prediction Performance,” in Multivariate Statistical Machine Learning Methods for Genomic Prediction, Cham: Springer International Publishing, 2022, pp. 109–139. doi: 10.1007/978-3-030-89010-0_4.

B. M. Turner, B. U. Forstmann, and M. Steyvers, “Assessing Model Performance with Generalization Tests,” 2019, pp. 39–51. doi: 10.1007/978-3-030-03688-1_3.

N. Le et al., “K-Fold Cross-Validation: An Effective Hyperparameter Tuning Technique in Machine Learning on GNSS Time Series for Movement Forecast,” 2024, pp. 377–382. doi: 10.1007/978-3-031-43218-7_88.

J. M. Zhang, M. Harman, B. Guedj, E. T. Barr, and J. Shawe-Taylor, “Model validation using mutated training labels: An exploratory study,” Neurocomputing, vol. 539, p. 126116, Jun. 2023, doi: 10.1016/j.neucom.2023.02.042.

S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” PLoS One, vol. 13, no. 5, p. e0196391, May 2018, doi: 10.1371/journal.pone.0196391.

O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, pp. 1–14, 2024, doi: 10.1038/s41598-024-56706-x.

F. Saeik et al., “Task offloading in Edge and Cloud Computing: A survey on mathematical, artificial intelligence and control theory solutions,” Comput. Networks, vol. 195, p. 108177, Aug. 2021, doi: 10.1016/j.comnet.2021.108177.

S. Taraji et al., “Novel Machine Learning Algorithms for Prediction of Treatment Decisions in Adult Patients With Class III Malocclusion,” J. Oral Maxillofac. Surg., vol. 81, no. 11, pp. 1391–1402, Nov. 2023, doi: 10.1016/j.joms.2023.07.137.

J. Roessler and P. A. Gloor, “Measuring happiness increases happiness,” J. Comput. Soc. Sci., vol. 4, no. 1, pp. 123–146, May 2021, doi: 10.1007/s42001-020-00069-6.

D. Liu, “The effectiveness of three-way classification with interpretable perspective,” Inf. Sci. (Ny)., vol. 567, pp. 237–255, Aug. 2021, doi: 10.1016/j.ins.2021.03.030.

B. Gao et al., “Enhancing anomaly detection accuracy and interpretability in low-quality and class imbalanced data: A comprehensive approach,” Appl. Energy, vol. 353, p. 122157, Jan. 2024, doi: 10.1016/j.apenergy.2023.122157.

E. M. Saoudi, J. Jaafari, and S. J. Andaloussi, “Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN,” Sci. African, vol. 21, p. e01796, Sep. 2023, doi: 10.1016/j.sciaf.2023.e01796.

G. Naidu, T. Zuva, and E. M. Sibanda, “A Review of Evaluation Metrics in Machine Learning Algorithms,” 2023, pp. 15–25. doi: 10.1007/978-3-031-35314-7_2.

DOI: http://dx.doi.org/10.30645/jurasik.v9i2.815

DOI (PDF): http://dx.doi.org/10.30645/jurasik.v9i2.815.g789

Refbacks

There are currently no refbacks.

JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Print/Online ISSN 2527-5771/2549-7839
Organized by LPPM STIKOM Tunas Bangsa
Published by STIKOM Tunas Bangsa
W: https://tunasbangsa.ac.id/ejurnal/index.php/jurasik

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Published Papers Indexed/Abstracted By:

Jumlah Kunjungan : View My Stats

Username
Password
Remember me