Elder Depression Detection by Multimodal Means


  • Tianruo Xu


Multimodal; Depression Detection; Elderly People; Transformer Model.


Depression in the elder group is a widespread but silent issue. This article presents a solution for detecting depression in the elderly group via multi-modal approaches. The conceptual framework for the solution is demonstrated by the Multimodal Transformer Elder Depression Detection model: the model processes the input with multiple forms of data, such as text, audio, and video, and gathers information for data fusion. After data fusion, the model can produce a score to reflect the mental state of the patient. In the model, DAIC and extended DAIC databases are used for training and testing. Also, we have established additional testing based on the information gathered from the elder members from local hospital. The testing results demonstrate that the model can successfully detect depression in the elderly group.


Seligman, M. E., Abramson, L. Y., Semmel, A., & von Baeyer, C. (1979). Depressive attributional style. Journal of Abnormal Psychology, 88(3), 242–247. doi.org/10.1037/0021-843X.88.3.242.

George, S A. (2005). Depression in the elderly, The Lancet, Volume 365, Issue 9475, Pages 1961-1970, ISSN 0140-6736, doi.org/10.1016/S0140-6736(05)66665-2.

S Brown, FP Varghese, BS McEwen. (2004). Association of depression with medical illness: does cortisol play a role? Biol Psychiatry, 55, pp. 1-9.

Raymond, C., Gregorius, S. B., Sandeep, D., Fabian C., (2021). A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Computers in Biology and Medicine, Volume 135, ISSN 0010-4825, doi.org/10.1016/j.compbiomed.2021.104499.

Chowdhary, K. (2020). “Natural Language Processing.” SpringerLink.

Fishwick, Paul & Narayanan, N. & Sticklen, Jon & Bonarini, Andrea. (1994). A Multi-Model Approach to Reasoning and Simulation. Systems, Man and Cybernetics, IEEE Transactions on. 24. 1433 - 1449. 10.1109/21.310527.

Pyrovolakis K, Tzouveli P, Stamou G. (2022). Multi-Modal Song Mood Detection with Deep Learning. Sensors. 22(3):1065. doi.org/10.3390/s22031065.

Cheng, P., Hao, W., Dai, S., et al. (2020) Club: A contrastive log-ratio upper bound of mutual information. In International Conference on Machine Learning, pages 1779–1788. PMLR.

Michael P. Notter “Age Prediction of a Speaker’s Voice.” GitHub, miykael. github.io/ blog/ 2022/ audio_ eda_ and_modeling.

Valueva, M.V.; Nagornov, N.N.; Lyakhov, P.A.; Valuev, G.V.; Chervyakov, N.I. (2020). "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation". Mathematics and Computers in Simulation. Elsevier BV. 177: 232–243. doi: 10. 1016/j. matcom.

Grefenstette, Edward; Blunsom, Phil; de Freitas, Nando; Hermann, Karl Moritz (2014). "A Deep Architecture for Semantic Parsing". arXiv:1404.7296.

Hsieh, W.W. (2009). Machine learning methods in the environmental sciences: Neural networks and kernels: Cambridge university press.

Pradhan, Sameer S., et al. (2004) "Shallow semantic parsing using support vector machines." Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL.

Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017-06-12). "Attention Is All You Need". arXiv:1706.03762.

Brownlee, Jason. “A Gentle Introduction to Multiple-Model Machine Learning.” Machine Learning Mastery, 21 Oct. 2021, machinelearningmastery.com/multiple-model-machine-learning.

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011, January). Multimodal deep learning. In ICML.




How to Cite

Xu, T. (2022). Elder Depression Detection by Multimodal Means. BCP Education & Psychology, 7, 64–73. Retrieved from http://bcpublication.org/index.php/EP/article/view/2609