ResEff-YOLO: Accuracy Enhancement of YOLOv8 through Integration of ResNet, SPPF, and EfficientHead Modules

Yihe Jin; Haiyun Gan; Jun Li; Tao Zhang

doi:10.6919/ICJE.202502_11(2).0004

Authors

Yihe Jin
Haiyun Gan
Jun Li
Tao Zhang

DOI:

https://doi.org/10.6919/ICJE.202502_11(2).0004

Keywords:

Object Detection; YOLOv8; Multi-Scale Feature Extraction.

Abstract

Since computer vision has been widely applied in daily life, such as in autonomous vehicles, portable applications, augmented reality systems, and medical image analysis, the demand for architectures with lower complexity and higher accuracy has become a priority. To improve the accuracy and complexity of object detection, various methods have been developed, such as R-CNN and YOLO models. In this study, we propose an enhanced version of the YOLOv8 model to further improve detection accuracy and efficiency. Specifically, we adopted EfficientHead as the detection head, which optimizes computational resource utilization and improves inference speed while maintaining detection accuracy. For the backbone network, we incorporated the ResNet18d module along with the SPPF_LSKA module, which enhances the network's ability to learn multi-scale features, surpassing traditional convolutional layers. The deep stem structure of ResNet18d helps retain more spatial information, while SPPF_LSKA introduces Large Separable Kernel Attention (LSKA) to enhance the SPPF feature extractor, improving multi-scale feature extraction and handling of complex scenes. Experiments on the VOC dataset demonstrate that the ResEff-YOLO model outperforms the YOLOv8 series, with a mean average precision (mAP) improvement of approximately 4% and an mAP50-95 improvement of 4.2%.

Downloads

Download data is not yet available.

References

[1] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

[2] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

[3] Redmon, J. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

[4] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

[5] Thuan, D. (2021). Evolution of Yolo algorithm and Yolov5: The State-of-the-Art object detention algorithm.

[6] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[7] Ren, S. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497.

[8] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[9] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

[10] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.

[11] Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790)..

[12] Wang, H., Han, X., Song, X., Su, J., Li, Y., Zheng, W., & Wu, X. (2024). Research on automatic pavement crack identification Based on improved YOLOv8. International Journal on Interactive Design and Manufacturing (IJIDeM), 1-11.

[13] Amin, J., Shazadi, I., Sharif, M., Yasmin, M., Almujally, N. A., & Nam, Y. (2024). Localization and grading of NPDR lesions using ResNet-18-YOLOv8 model and informative features selection for DR classification based on transfer learning. Heliyon, 10(10).

[14] Zhao, L., Liang, G., Hu, Y., Xi, Y., Ning, F., & He, Z. (2024). YOLO-RLDW: An Algorithm for Object Detection in Aerial Images Under Complex Backgrounds. IEEE Access.

[15] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[16] J. Johnson, “CNN benchmarks,” 2017.[Online]. Available: https://github.com/jcjohnson/cnn-benchmarks#readme

[17] Lau, K. W., Po, L. M., & Rehman, Y. A. U. (2024). Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Systems with Applications, 236, 121352.

[18] Song, G., Liu, Y., & Wang, X. (2020). Revisiting the sibling head in object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11563-11572).

[19] Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO (Version 8.0.0) [Computer Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 June 2024).

[20] Ge, Z. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.

[21] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88, 303-338.

[22] Ahmad, T., Ma, Y., Yahya, M., Ahmad, B., Nazir, S., & Haq, A. U. (2020). Object detection through modified YOLO neural network. Scientific Programming, 2020(1), 8403262.