MACA-Net: A Multi-head Attention and Clustering-guided Network for Facial Expression Recognition

Authors

  • Dongmei Ma
  • Zhitao Zheng

DOI:

https://doi.org/10.6911/WSRJ.202505_11(5).0014

Keywords:

Facial Expression Recognition; Multi-head Cross Attention; Adaptive Feature Clustering Loss.

Abstract

Facial expression recognition (FER) in uncontrolled environments poses significantly greater challenges compared to controlled settings. On the one hand, various real-world interferences—such as illumination variations, pose deviations, and occlusions—severely hinder the extraction of robust and discriminative features. On the other hand, field-collected expression datasets often exhibit high intra-class variability and low inter-class separability, further increasing the difficulty of accurate expression classification and degrading overall recognition performance. To address these challenges, this paper proposes a novel FER framework, MACA-Net, which integrates an enhanced loss function with a multi-head attention-based feature extraction strategy. Specifically, a Multi-head Cross Attention (MCHA) module is introduced to capture diverse and complementary local features, thereby enriching the representational capacity of regional expressions. In addition, an Adaptive Feature Clustering Loss (AFC-Loss) is designed to promote intra-class compactness and inter-class dispersion in the learned feature space, effectively improving the model’s discriminative power. Extensive experiments conducted on two challenging FER benchmarks—RAF-DB and FerPlus—demonstrate that MACA-Net achieves recognition accuracies of 89.01% and 89.85%, respectively, outperforming several state-of-the-art methods and validating the effectiveness of the proposed approach.

Downloads

Download data is not yet available.

References

[1] Lee J, Kim S, Kim S, et al. Multi-Modal Recurrent Attention Networks for Facial Expression Recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 6977-6991.

[2] Samara A, Galway L, Bond R, et al. Affective state detection via facial expression analysis within a human-computer interaction context[J]. Journal of Ambient Intelligence and Humanized Computing. 2019,10(6):2175-2184.

[3] He L, Guo CG, Tiwari P, Dang W, et al. Intelligent system for depression scale estimation with facial expressions and case study in industrial intelligence[J]. International journal of intelligent systems. 2022, 37(12): 10140-10156.

[4] Jeong M, Ko BC. Driver's Facial Expression Recognition in Real-Time for Safe Driving[J]. Sensors,2018,18(12):4270-4288.

[5] Xie Z H, Cheng S J. Micro-Expression spotting based on a short-duration prior and multi-stage feature extraction[J]. Electronics, 2023,12(2):434-434.

[6] Lucey P, Cohn J F, Kanade T, et al. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression[C]//Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, Jun 13-18, 2010. Washington: IEEE Computer Society, 2010: 94-101.

[7] Pantic M, Valstar M F, Rademaker R, et al. Webbased database for facial expression analysis[C]//Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, Jul 6- 9, 2005. Washington: IEEE Computer Society, 2005: 317-321.

[8] Lyons M J, Akamatsu S, Kamachi M, et al. Coding facial expressions with gabor wavelets[C]//Proceedings of the 3rd International Conference on Face & Gesture Recognition, Apr 14-16, 1998. Washington: IEEE Computer Society, 1998: 200-205.

[9] Li S, Deng W, Du J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2852-2861.

[10] Barsoum E, Zhang C, Ferrer C C, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C]//Proceedings of the 18th ACM international conference on multimodal interaction. 2016: 279-283.

[11] Mollahosseini A, Hasani B, Mahoor M H. AffectNet: a database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 2019, 10(1): 18-31.

[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems. 2017, 30.

[13] Chen M, Radford A, Child R, et al. Generative pretraining from pixels[C]// International conference on machine learning. PMLR, 2020: 1691-1703.

[14] Van der Maaten L, Hinton G. Visualizing data using t-sne.[J]. Journal of Machine Learning Research, 2008, 9(11).

[15] Wang K, Peng X, Yang J, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069..

[16] Zeng J, Shan S, Chen X. Facial expression recognition with inconsistently annotated datasets[C]. Proceedings of the European Conference on Computer Vision (ECCV). 2018: 222-237.

[17] Wang K, Peng X, YANG J, et al. Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 6897-6906.

[18] She J, Hu Y, Shi H, et al. Dive into ambiguity: Latent distribution mining and pairwise uncertainty estimation for facial expression recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 6248-6257.

[19] Liu C, Hirota K, Dai Y. Patch attention convolutional vision transformer for facial expression recognition with occlusion[J]. Information Sciences, 2023, 619: 781-794.

[20] Zhang Y, Wang C, Ling X, et al. Learn from all: Erasing attention consistency for noisy label facial expression recognition[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 418-434.

[21] Zheng J, Li B, Zhang S, et al. Attack can benefit: An adversarial approach to recognizing facial expressions under noisy annotations[C]. Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 37. 2023: 3660-3668.

Downloads

Published

2025-04-26

Issue

Section

Articles

How to Cite

Ma, Dongmei, and Zhitao Zheng. 2025. “MACA-Net: A Multi-Head Attention and Clustering-Guided Network for Facial Expression Recognition”. World Scientific Research Journal 11 (5): 108-23. https://doi.org/10.6911/WSRJ.202505_11(5).0014.