A Hybrid Framework Integrating Speech Recognition, Lexical Frequency Analysis, and BERT-BiLSTM for Computational Modeling of Age-Based Emotional Expression Preferences in Nanjing Wu Dialect

Zuming Wang

doi:10.54691/npye3p87

Authors

Zuming Wang

DOI:

https://doi.org/10.54691/npye3p87

Keywords:

Nanjing Wu dialect, emotional expression preferences, age stratification, BERT-BiLSTM, hybrid framework.

Abstract

Computational modeling of dialectal emotional expression holds significant value for dialect digitalization and intelligent speech service development, yet existing studies generally overlook the moderating effect of age variables on emotional expression. This study constructs a hybrid computational framework integrating speech recognition, lexical frequency analysis, and BERT-BiLSTM to achieve quantitative modeling of age-based emotional expression preferences in Nanjing Wu dialect. The framework realizes dialect speech transcription through Wav2Vec 2.0 transfer learning, extracts age-group vocabulary preference vectors using the TF-IDF method, and captures deep semantic features through the BERT-BiLSTM cascaded architecture. Experiments based on Nanjing Wu dialect corpus demonstrate that the hybrid framework outperforms all baseline models in sentiment classification tasks, while age-stratified analysis reveals a significant pattern wherein positive sentiment proportion increases progressively with age while negative sentiment proportion decreases. The research outcomes can provide a reusable technical solution for dialect emotion computing and offer empirical evidence for the design of age-friendly intelligent speech services.

Downloads

Download data is not yet available.

References

[1] Q. Li, Q. Mai, M. Wang, and M. Ma, "Chinese dialect speech recognition: a comprehensive survey," Artificial Intelligence Review, vol. 57, no. 2, p. 25, 2024.

[2] M. Hejná and A. Jespersen, "Ageing well: Social but also biological reasons for age‐grading," Language and Linguistics Compass, vol. 16, no. 5-6, p. e12450, 2022.

[3] H. Lian, C. Lu, S. Li, Y. Zhao, C. Tang, and Y. Zong, "A survey of deep learning-based multimodal emotion recognition: Speech, text, and face," Entropy, vol. 25, no. 10, p. 1440, 2023.

[4] W. Lai and Y. Zheng, "Speech recognition of south China languages based on federated learning and mathematical construction," Electronic Research Archive, vol. 31, no. 8, 2023.

[5] S. Zhang, Y. Yang, C. Chen, X. Zhang, Q. Leng, and X. Zhao, "Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects," Expert Systems with Applications, vol. 237, p. 121692, 2024.

[6] X. Yue, L. Miao, and J. Ding, "Research on Wu Dialect Recognition and Regional Variations Based on Deep Learning," Applied Sciences, vol. 15, no. 18, p. 10227, 2025.

[7] D. E. Cahyani and I. Patasik, "Performance comparison of tf-idf and word2vec models for emotion text classification," Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5, pp. 2780-2788, 2021.

[8] X. Li, Y. Lei, and S. Ji, "BERT-and BiLSTM-based sentiment analysis of online Chinese buzzwords," Future Internet, vol. 14, no. 11, p. 332, 2022.

[9] C. Gan, Q. Feng, and Z. Zhang, "Scalable multi-channel dilated CNN–BiLSTM model with attention mechanism for Chinese textual sentiment analysis," Future Generation Computer Systems, vol. 118, pp. 297-309, 2021.

[10] Y. Wu, S. Zhang, and P. Li, "Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features," Scientific Reports, vol. 15, no. 1, p. 8855, 2025.

[11] X. Li, L. Chen, B. Chen, and X. Ge, "BERT-BiLSTM-Attention model for sentiment analysis on Chinese stock reviews," 2024.

[12] P.-Y. Zeng and S.-L. Yeh, "Exploring semantic expression disparities in intragenerational and intergenerational communication: A novel perspective on socioemotional selectivity theory," Psychology and Aging, 2025.

[13] Y. Lin et al., "Category-sensitive age-related shifts between prosodic and semantic dominance in emotion perception linked to cognitive capacities," Journal of Speech, Language, and Hearing Research, vol. 67, no. 12, pp. 4829-4849, 2024.

[14] J. Xu, "A natural language processing based technique for sentiment analysis of college english corpus," PeerJ Computer Science, vol. 9, p. e1235, 2023.

[15] Z. Qi, F. Li, and H. Long, "Research on optimal deep learning modeling in HaiNan dialect recognition," Scientific Reports, vol. 15, no. 1, p. 31735, 2025.