Multi-knowledge Collaborative Distillation Framework based on an Encoder-decoder Feature Projector

Kangping Chen; Hong Zhao

doi:10.6919/ICJE.202503_11(3).0014

Authors

Kangping Chen
Hong Zhao

DOI:

https://doi.org/10.6919/ICJE.202503_11(3).0014

Keywords:

Multi-Knowledge Distillation; Encoder-Decoder Mechanisms; Feature Projector; Model Compression.

Abstract

Knowledge distillation as a flexible model compression technique, is widely applied in various computer vision tasks to transfer knowledge from large-scale models to lightweight, small-scale models. However, existing knowledge distillation methods, particularly feature-based distillation approaches, often require the alignment of heterogeneous features, which can lead to a decline in student model performance due to misalignment issues. To address this, we propose a multi-knowledge collaborative distillation framework based on an encoder-decoder feature projector. To avoid the computational overhead introduced by complex feature alignment mechanisms, we reuse the teacher classifier and design an encoder-decoder-based feature projector to facilitate the alignment of deep features between the student and teacher models. Furthermore, considering the progressive learning process of the student model and reducing the additional workload caused by tuning distillation temperature parameters, we introduce a progressive distillation temperature adjustment mechanism. Extensive experiments on the benchmark dataset CIFAR-100 validate the effectiveness of our distillation method, achieving outstanding performance across various teacher-student architecture combinations.

Downloads

Download data is not yet available.

References

[1] HINTON G. Distilling the Knowledge in a Neural Network[J]. arXiv preprint arXiv:1503.02531, 2015.

[2] YANG C, XIE L, SU C, et al. Snapshot distillation: Teacher-student optimization in one generation[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 2859-2868[2024-03-12]. http://openaccess.thecvf.com/content_CVPR_2019/html/ Yang_Snapshot_Distillation_Teacher-Student_Optimization_in_One_Generation_CVPR_2019_paper.html.

[3] HUANG T, YOU S, WANG F, et al. Knowledge distillation from a stronger teacher[J]. Advances in Neural Information Processing Systems, 2022, 35: 33716-33727.

[4] LI Z, LI X, YANG L, et al. Curriculum temperature for knowledge distillation[C/OL]//Proceedings of the AAAI Conference on Artificial Intelligence: vol. 37. 2023: 1504-1512[2024-04-11]. https://ojs.aaai.org/ index.php/AAAI/article/view/25236.

[5] TUNG F, MORI G. Similarity-preserving knowledge distillation[C/OL]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1365-1374[2024-03-12]. http://openaccess.thecvf. com/content_ICCV_2019/html/Tung_Similarity-Preserving_Knowledge_Distillation_ICCV_2019_paper.html.

[6] ZAGORUYKO S, KOMODAKIS N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer[A/OL]. arXiv, 2017[2024-03-12]. http://arxiv.org/ abs/1612.03928.

[7] ROMERO A, BALLAS N, KAHOU S E, et al. Fitnets: Hints for thin deep nets. arXiv 2014[J]. arXiv preprint arXiv:1412.6550, 2014.

[8] HEO B, LEE M, YUN S, et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons[C/OL]//Proceedings of the AAAI conference on artificial intelligence: vol. 33. 2019: 3779-3787[2024-03-12]. https://aaai.org/ojs/index.php/AAAI/article/view/4264.

[9] ZHAO B, CUI Q, SONG R, et al. Decoupled knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 2022: 11953-11962.

[10] JIN Y, WANG J, LIN D. Multi-level logit distillation[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 24276-24285[2024-08-05]. http://openaccess.thecvf.com/content/CVPR2023/html/Jin_Multi-Level_Logit_Distillation_CVPR_2023_paper.html.

[11] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C/OL]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 13876-13885[2024-03-12]. http://openaccess.thecvf.com/content_CVPR_2020/html/ Yun_Regularizing_Class-Wise_Predictions_via_Self-Knowledge_Distillation_CVPR_2020_paper.html.

[12] JI M, HEO B, PARK S. Show, attend and distill: Knowledge distillation via attention-based feature matching[C/OL]//Proceedings of the AAAI Conference on Artificial Intelligence: vol. 35. 2021: 7945-7952[2024-03-12]. https://ojs.aaai.org/index.php/AAAI/article/view/16969.

[13] XU C, GAO W, LI T, et al. Teacher-student collaborative knowledge distillation for image classification[J/OL]. Applied Intelligence, 2023, 53(2): 1997-2009. DOI:10.1007/s10489-022-03486-4.

[14] CAO S, LI M, HAYS J, et al. Learning lightweight object detectors via multi-teacher progressive distillation[C/OL]//International Conference on Machine Learning. PMLR, 2023: 3577-3598[2025-01-16]. https://proceedings.mlr.press/v202/cao23c.html.

[15] CHEN D, MEI J P, ZHANG H, et al. Knowledge distillation with the reused teacher classifier[C/OL]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 11933-11942[2024-03-12]. http://openaccess.thecvf.com/content/CVPR2022/html/ Chen_Knowledge_Distillation_With_the_Reused_Teacher_Classifier_CVPR_2022_paper.html.

[16] JANG J, KIM S, YOO K Y, et al. Self-Distilled Self-Supervised Representation Learning.[J/OL]. 2021[2024-03-12]. http://arxiv.org/abs/2111.12958v1. DOI:10.48550/arXiv.2111.12958.

[17] HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. 2016.

[18] PARK W, KIM D, LU Y, et al. Relational knowledge distillation[C/OL]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 3967-3976[2024-03-12]. http://openaccess. thecvf.com/content_CVPR_2019/html/Park_Relational_Knowledge_Distillation_CVPR_2019_paper.html.

[19] TIAN Y, KRISHNAN D, ISOLA P. Contrastive Representation Distillation[A]. arXiv, 2022.

[20] HEO B, KIM J, YUN S, et al. A comprehensive overhaul of feature distillation[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930[2024-04-02]. http://openaccess.thecvf.com/content_ICCV_2019/html/Heo_A_Comprehensive_Overhaul_of_Feature_Distillation_ICCV_2019_paper.html.

[21] YANG J, MARTINEZ B, BULAT A, et al. Knowledge distillation via softmax regression representation learning[C/OL]//International Conference on Learning Representations. 2020[2024-03-12]. https://openreview.net/forum?id=ZzwDy_wiWv.

[22] CHEN P, LIU S, ZHAO H, et al. Distilling knowledge via knowledge review[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 5008-5017.