Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation
DOI:
https://doi.org/10.54691/7cxc1n66Keywords:
Real-time Semantic Segmentation; Multiscale; Dual Attention; Single-branch.Abstract
To achieve semantic segmentation tasks in practical applications such as autonomous driving, networks need to efficiently process high-resolution images while maintaining high accuracy. This requires methods to effectively fuse spatial information in high-resolution images with semantic information in low-resolution images. To address this, this paper proposes a Multi-scale Fusion Convolutional Network (MFCNet) based on a single-branch network structure. In order to simultaneously handle information at different scales and assist the network in capturing a wide range of contextual information, separable Multi-Scale Convolution Modules (MSCM) are introduced to enable the network to obtain richer and more comprehensive feature representations. Additionally, considering that shallow-level information is difficult to directly restore resolution, a Dual-Attention Fusion Module (DAFM) is designed, introducing two attention mechanisms to respectively weight feature maps at different resolutions. Experimental results demonstrate that MFCNet achieves outstanding performance in real-time semantic segmentation tasks.
Downloads
References
Tsai J, Chang C C, Li T. Autonomous driving control based on the technique of semantic segmentation [J]. Sensors, 2023, 23(2): 895.
Zhuang J, Wang Z, Wang B. Video semantic segmentation with distortion-aware feature correction [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(8): 3128-3139.
Tan Z, Liu B, Chu Q, et al. Real time video object segmentation in compressed domain [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(1): 175-188.
Li X, Su J, Yue Z, et al. Adaptive multi-ROI agricultural robot navigation line extraction based on image semantic segmentation [J]. Sensors, 2022, 22(20): 7707.
Li H, Xiong P, Fan H, et al. DFANet: Deep feature aggregation for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 9522-9531.
Yu C, Wang J, Peng C, et al. BiseNet: Bilateral segmentation network for real-time semantic segmentation [C]. European Conference on Computer Vision, 2018: 325-341.
Zhao H, Qi X, Shen X, et al. ICNet for real-time semantic segmentation on high-resolution images [C]. European Conference on Computer Vision, 2018: 405-420.
Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation [C] IEEE Conference on Computer Vision and Pattern Recognition. 2019: 5693-5703.
Romera E, Alvarez J M, Bergasa L M, et al. ERFNet: Efficient residual factorized convnet for real-time semantic segmentation [J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 19(1): 263-272.
Yu C, Gao C, Wang J, et al. BiseNet V2: Bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision, 2021, 129(11): 3051-3068.
Fan M, Lai S, Huang J, et al. Rethinking BiSeNet for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2021: 9716-9725.
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs [J]. IEEE Transactions on Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation [J]. arXiv preprint arXiv: 1706.05587, 2017.
Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation [C]. European Conference on Computer Vision, 2018: 801-818.
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881-2890.
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 3146-3154.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks [C] IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4700-4708.
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks [C] International Conference on Machine Learning. 2019: 6105-6114.
Zhang X, Zhou X, Lin M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices [C] IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431-3440.
Sandler M, Howard A, Zhu M, et al. MobileNetV2: Inverted residuals and linear bottlenecks [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4510-4520.
Hong Y, Pan H, Sun W, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes [J]. arXiv preprint arXiv: 2101.06085, 2021.
Tan M, Le Q V. MixConv: Mixed depthwise convolutional kernels [J]. arXiv preprint arXiv: 1907.09595, 2019.
Wang W, Chen W, Qiu Q, et al. Crossformer++: A versatile vision transformer hinging on cross-scale attention [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223.
Brostow G J, Shotton J, Fauqueur J, et al. Segmentation and recognition using structure from motion point clouds [C]. European Conference on Computer Vision, 2008: 44-57.
Ma Y, Yu D, Wu T, et al. PaddlePaddle: An open-source deep learning platform from industrial practice [J]. Frontiers of Data and Domputing, 2019, 1(1): 105-115.
Mehta S, Rastegari M, Shapiro L, et al. ESPNetv2: A light-weight, power efficient, and general purpose convolutional neural network [C]. IEEE Conference on Computer Vision and Pattern Recognition. 2019: 9190-9200.
Poudel R P K, Liwicki S, Cipolla R. Fast-SCNN: Fast semantic segmentation network [J]. arXiv preprint arXiv: 1902.04502, 2019.
Emara T, Abd El Munim H E, Abbas H M. Liteseg: A novel lightweight convnet for semantic segmentation [C]. Digital Image Computing Techniques and Applications, 2019: 1-7.
Lin P, Sun P, Cheng G, et al. Graph-guided architecture search for real-time semantic segmentation [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2020: 4203-4212.
Sun L, Yang K, Hu X, et al. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images [J]. IEEE Robotics and Automation Letters, 2020, 5(4): 5558-5565.
Chen W, Gong X, Liu X, et al. Fasterseg: Searching for faster real-time semantic segmentation [J]. arXiv preprint arXiv: 1912.10917, 2019.
Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation [J]. arXiv preprint arXiv: 1606.02147, 2016.
Kumaar S, Lyu Y, Nex F, et al. CABiNet: Efficient context aggregation network for low-latency semantic segmentation [C]. IEEE International Conference on Robotics and Automation, 2021: 13517-13524.
Orsic M, Kreso I, Bevandic P, et al. In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2019: 12607-12616.
Elhassan M A M, Yang C, Huang C, et al. S^2-FPN: Scale-ware strip attention guided feature pyramid network for real-time semantic segmentation [J]. arXiv preprint arXiv: 2206.07298, 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Frontiers in Science and Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




