Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation

Authors

  • Jiali Xing
  • Yongsheng Dong

DOI:

https://doi.org/10.54691/7cxc1n66

Keywords:

Real-time Semantic Segmentation; Multiscale; Dual Attention; Single-branch.

Abstract

To achieve semantic segmentation tasks in practical applications such as autonomous driving, networks need to efficiently process high-resolution images while maintaining high accuracy. This requires methods to effectively fuse spatial information in high-resolution images with semantic information in low-resolution images. To address this, this paper proposes a Multi-scale Fusion Convolutional Network (MFCNet) based on a single-branch network structure. In order to simultaneously handle information at different scales and assist the network in capturing a wide range of contextual information, separable Multi-Scale Convolution Modules (MSCM) are introduced to enable the network to obtain richer and more comprehensive feature representations. Additionally, considering that shallow-level information is difficult to directly restore resolution, a Dual-Attention Fusion Module (DAFM) is designed, introducing two attention mechanisms to respectively weight feature maps at different resolutions. Experimental results demonstrate that MFCNet achieves outstanding performance in real-time semantic segmentation tasks.

Downloads

Download data is not yet available.

References

Tsai J, Chang C C, Li T. Autonomous driving control based on the technique of semantic segmentation [J]. Sensors, 2023, 23(2): 895.

Zhuang J, Wang Z, Wang B. Video semantic segmentation with distortion-aware feature correction [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(8): 3128-3139.

Tan Z, Liu B, Chu Q, et al. Real time video object segmentation in compressed domain [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(1): 175-188.

Li X, Su J, Yue Z, et al. Adaptive multi-ROI agricultural robot navigation line extraction based on image semantic segmentation [J]. Sensors, 2022, 22(20): 7707.

Li H, Xiong P, Fan H, et al. DFANet: Deep feature aggregation for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 9522-9531.

Yu C, Wang J, Peng C, et al. BiseNet: Bilateral segmentation network for real-time semantic segmentation [C]. European Conference on Computer Vision, 2018: 325-341.

Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images [C]. European Conference on Computer Vision, 2018: 405-420.

Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation [C] IEEE Conference on Computer Vision and Pattern Recognition. 2019: 5693-5703.

Romera E, Alvarez J M, Bergasa L M, et al. ERFNet: Efficient residual factorized convnet for real-time semantic segmentation [J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272.

Yu C, Gao C, Wang J, et al. BiseNet V2: Bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.

Fan M, Lai S, Huang J, et al. Rethinking BiSeNet for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2021: 9716-9725.

Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs [J]. IEEE Transactions on Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.

Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation [J]. arXiv preprint arXiv: 1706.05587, 2017.

Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]. European Conference on Computer Vision, 2018: 801-818.

Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.

Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 3146-3154.

He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.

Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks [C] IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4700-4708.

Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks [C] International Conference on Machine Learning. 2019: 6105-6114.

Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C] IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.

Sandler M, Howard A, Zhu M, et al. MobileNetV2: Inverted residuals and linear bottlenecks [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.

Hong Y, Pan H, Sun W, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes [J]. arXiv preprint arXiv: 2101.06085, 2021.

Tan M, Le Q V. MixConv: Mixed depthwise convolutional kernels [J]. arXiv preprint arXiv: 1907.09595, 2019.

Wang W, Chen W, Qiu Q, et al. Crossformer++: A versatile vision transformer hinging on cross-scale attention [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.

Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223.

Brostow G J, Shotton J, Fauqueur J, et al. Segmentation and recognition using structure from motion point clouds [C]. European Conference on Computer Vision, 2008: 44-57.

Ma Y, Yu D, Wu T, et al. PaddlePaddle: An open-source deep learning platform from industrial practice [J]. Frontiers of Data and Domputing, 2019, 1(1): 105-115.

Mehta S, Rastegari M, Shapiro L, et al. ESPNetv2: A light-weight, power efficient, and general purpose convolutional neural network [C]. IEEE Conference on Computer Vision and Pattern Recognition. 2019: 9190-9200.

Poudel R P K, Liwicki S, Cipolla R. Fast-SCNN: Fast semantic segmentation network [J]. arXiv preprint arXiv: 1902.04502, 2019.

Emara T, Abd El Munim H E, Abbas H M. Liteseg: A novel lightweight convnet for semantic segmentation [C]. Digital Image Computing Techniques and Applications, 2019: 1-7.

Lin P, Sun P, Cheng G, et al. Graph-guided architecture search for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2020: 4203-4212.

Sun L, Yang K, Hu X, et al. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images [J]. IEEE Robotics and Automation Letters, 2020, 5(4): 5558-5565.

Chen W, Gong X, Liu X, et al. Fasterseg: Searching for faster real-time semantic segmentation [J]. arXiv preprint arXiv: 1912.10917, 2019.

Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation [J]. arXiv preprint arXiv: 1606.02147, 2016.

Kumaar S, Lyu Y, Nex F, et al. CABiNet: Efficient context aggregation network for low-latency semantic segmentation [C]. IEEE International Conference on Robotics and Automation, 2021: 13517-13524.

Orsic M, Kreso I, Bevandic P, et al. In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 12607-12616.

Elhassan M A M, Yang C, Huang C, et al. S^2-FPN: Scale-ware strip attention guided feature pyramid network for real-time semantic segmentation [J]. arXiv preprint arXiv: 2206.07298, 2022.

Downloads

Published

2024-05-22

Issue

Section

Articles

How to Cite

Xing, J., & Dong, Y. (2024). Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation. Frontiers in Science and Engineering, 4(5), 58-70. https://doi.org/10.54691/7cxc1n66