Application of Pre-trained Model-based Speech Analysis in Depression Detection
DOI:
https://doi.org/10.54691/j0nkkm64Keywords:
Pre-trained Models; Depression Detection; Speech Analysis; Emotional State Analysis; MODMA Dataset.Abstract
Detecting depression at an early stage is critical for both public health and the well-being of patients. Even though there has been much improvement with the use of automatic depression assessment technologies based on machine learning, a number of problems still exist: demographic confounding factors; complicated feature engineering; and data privacy worries; restricted sample size; and so on. To address these challenges, this study proposes a depression detection method based on pre-trained models that utilize speech data from the MODMA dataset. The innovation of this study lies in the use of pre-trained models to solve problems related to small sample sizes, while adapting the model to the Chinese language contexts and enhancing its generalization capabilities. At the same time, this study systematically examines the effectiveness of verbal tasks associated with different emotions and categories in detecting depression. These findings not only improve the accuracy, practicality, and privacy protection of depression diagnosis but also offer new insights for achieving more personalized and precise mental health management. Future research will further explore the model's generalization ability across diverse datasets and aim to apply it towards developing practical applications.
Downloads
References
[1] W. H. Organization et al., Depressive disorder (depression), 2023.
[2] Chinese National Mental Health Development Report (2019-2020), Institute of Psychology, CAS.
[3] Wollenhaupt-Aguiar, B. et al. Differential biomarker signatures in unipolar and bipolar depression: A machine learning approach. Aust. N. Z. J. Psychiatry 54(4), 393–401 (2020).
[4] Li, J. et al. Intelligent depression detection with a synchronous federated optimization. Complex Intell. Syst. 9(1), 115–131 (2023).
[5] Casado, C. Á., Cañellas, M. L., & López, M. B. Depression recognition using remote photoplethysmography from facial videos. IEEE Trans. Affect. Comput. https://doi.org/10.1109/ TAFFC.2023.3238641 (2023).
[6] Yang, W. et al. Attention guided learnable time-domain filterbanks for speech depression detection. Neural Netw. https://doi.org/10.1016/j.neunet.2023.05.041 (2023).
[7] Wang, B. et al. Depression signal correlation identification from different EEG channels based on CNN feature extraction. Psychiatry Res. Neuroimaging 328, 111582 (2023).
[8] Lyu, H. et al. Task-state skin potential abnormalities can distinguish major depressive disorder and bipolar depression from healthy controls. Transl. Psychiatry 14(1), 110 (2024).
[9] Fang, M. et al. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomed. Signal Process. Control 82, 104561 (2023).
[10] Hanshu, Cai et al. MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis.
[11] Kavi Priya, S., & Pon Karthika, K. EliteVec: Feature Fusion for Depression Diagnosis Using Optimized Long Short-Term Memory Network (2023).
[12] Guo, Y. Automatic Depression Detection via Learning and Fusing Features From Visual Cues (2022).
[13] Zhang, P. Y., Wu, M. Y., Dinkel, H., & Yu, K. DEPA: Self-supervised audio embedding for depression detection. Proceedings of the 29th ACM International Conference on Multimedia, 135-143. Chengdu, China: ACM. DOI: 10.1145/3474085.3479236 (2021).
[14] Ksibi, A. et al. Electroencephalography-based depression detection using multiple machine learning techniques. Diagnostics 13(10), 1779 (2023).
[15] Schumann, A. et al. Depressive rumination and heart rate variability: A pilot study on the effect of biofeedback on rumination and its physiological concomitants. Front. Psychiatry, 25 August 2022, Sec. Public Mental Health.
[16] Dumpala, S. H. Self-Supervised Embeddings for Detecting Individual Symptoms of Depression (2024).
[17] Huang, X. et al. Depression recognition using voice-based pre-training model. Sci. Rep. 14(1), 12734. DOI: 10.1038/s41598-024-63556-0 (2024).
[18] Wang, Y., Wang, Z., Li, C., Zhang, Y., & Wang, H. Online social network individual depression detection using a multitask heterogeneous modality fusion approach. Inf. Sci. 609, 727–749 (2022).
[19] Muzammel, M. End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis (2021).
[20] Fang, M., Peng, S. Y., Liang, Y. J., Hung, C. C., & Liu, S. H. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomed. Signal Process. Control 82, 104561 (2023). DOI: 10.1016/j.bspc.2022.104561.
[21] Sun, S. T., Chen, H. Y., Shao, X. X., Liu, L. L., Li, X. W., & Hu, B. EEG-based depression recognition by combining functional brain network and traditional biomarkers. Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine, 2074-2081. Seoul, Korea (South): IEEE (2020).
[22] Clark, S. R. et al. Bio-acoustic features of depression: A review. Biomed. Signal Process. Control 85, 105020 (2023).
[23] FFmpeg. https://ffmpeg.org/.
[24] Hugging Face. Wav2Vec2-large-XLSR-53-Chinese-zh-cn.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Scientific Journal of Intelligent Systems Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




