Multimodal Sentiment Analysis and Improved Transformer-Based Emotional Music Generation Method for Elderly People in Nursing Homes

Shijia Fan

doi:10.54691/a44vmp53

Authors

Shijia Fan

DOI:

https://doi.org/10.54691/a44vmp53

Keywords:

Multimodal sentiment analysis, emotional music generation, improved transformer.

Abstract

Aiming at the predicaments of loneliness, depression, and cognitive decline commonly faced by elderly people in nursing homes-where existing manual emotional support struggles to achieve accurate responses due to limited nursing resources. This study conducts research on multimodal sentiment analysis (integrating facial images, EEG signals, and EOG signals) and improved Transformer-based emotional music generation for this group. The research holds significant theoretical supplementary value and practical application significance: it not only provides a new paradigm of multimodal fusion for the field of elderly affective computing but also offers a precise emotional and cognitive intervention solution for the elderly in nursing homes, alleviating their negative emotions and reducing the burden on caregivers. The core innovations of the study are reflected in two aspects: First, a multimodal sentiment analysis model is constructed with optimized design in the feature extraction stage. An improved PFLD model incorporating a Feature Module is adopted to enhance the accuracy of facial landmark detection through multi-scale feature fusion; the Linear Dynamic System (LDS) is used to smooth EEG signal features for reducing noise interference; continuous wavelet transform and peak detection are combined to extract key features (e.g., blinking, fixation) from EOG signals; and a 1D Convolutional Neural Network (1DCCNN) is designed to adapt to the temporal characteristics of physiological signals. Finally, multimodal features are fused via a convolutional encoder and input into a Recurrent Neural Network (RNN) for sentiment classification, breaking the limitations of single-modal recognition. Second, the Discrete-Emo Transformer emotional music generation model is proposed: the Flash attention mechanism is introduced to alleviate memory bottlenecks and improve inference speed; the segment recurrence mechanism of Transformer-XL is integrated to enhance the model’s ability to learn long-range dependencies in long music sequences; and the REMI-EMO representation method is used to simultaneously model note features and emotional information, enabling high-quality symbolic music generation under emotional conditions. Experimental results show that the multimodal sentiment analysis model achieves an accuracy of 97.5%, outperforming mainstream models such as FDMER and MISA. The Discrete-Emo Transformer model reaches an emotional accuracy of 78.4% and performs optimally in the Perplexity (PPL) index (1.71), generating music that is more in line with target emotions and human creative characteristics. This study provides key technical support for "technology-empowered elderly care" and effectively improves the accuracy and efficiency of emotional intervention in nursing homes.

Downloads

Download data is not yet available.

References

[1] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[2] Zhao Z, Liu Q. Former-dfer: Dynamic facial expression recognition transformer[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 1553-1561.

[3] Goshvarpour A, Abbasi A, Goshvarpour A. An accurate emotion recognition system using ECG and GSR signals and matching pursuit method[J]. Biomedical Journal, 2017, 40(6): 355-368.

[4] Kim K H, Bang S W, Kim S R. Emotion recognition system using short-term monitoring of physiological signals[J]. Medical and Biological Engineering and Computing, 2004, 42: 419-427.

[5] Wang Z, Wang Y. Emotion recognition based on multimodal physiological electrical signals[J]. Frontiers in Neuroscience, 2025, 19: 1512799.

[6] Pan J, Fang W, Zhang Z, et al. Multimodal emotion recognition based on facial expressions, speech, and EEG[J]. IEEE Open Journal of Engineering in Medicine and Biology, 2023.

[7] Wang X, Ren Y, Luo Z, et al. Deep learning-based EEG emotion recognition: Current trends and future perspectives[J]. Frontiers in Psychology, 2023, 14: 1126994.

[8] Wang S, Qu J, Zhang Y, et al. Multimodal emotion recognition from EEG signals and facial expressions[J]. IEEE Access, 2023, 11: 33061-33068.

[9] Lian Y, Zhu M, Sun Z, et al. Emotion recognition based on EEG signals and face images[J]. Biomedical Signal Processing and Control, 2025, 103: 107462.

[10] Pan H, Pang Z, Wang Y, et al. A new image recognition and classification method combining transfer learning algorithm and mobilenet model for welding defects[J]. IEEE Access, 2020, 8: 119951-119960.

[11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.

[12] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166.

[13] Huang C Z A, Vaswani A, Uszkoreit J, et al. Music transformer: Generating music with long-term structure[C]//Proceedings of the International Conference on Learning Representations. 2018: 1-14.

[14] Sarmento P, Kumar A, Chen Y H, et al. GTR-CTRL: Instrument and genre conditioning for guitar-focused music generation with transformers[C]//International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar). Cham: Springer Nature Switzerland, 2023: 260-275.

[15] Madhok R, Goel S, Garg S. SentiMozart: Music Generation based on Emotions[C]//Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART). 2018, 2: 501-506.

[16] Zheng K, Meng R, Zheng C, et al. EmotionBox: A music-element-driven emotional music generation system using Recurrent Neural Network[J]. arXiv preprint arXiv:2112.08561, 2021.

[17] Bao C, Sun Q. Generating music with emotions[J]. IEEE Transactions on Multimedia, 2022.