Review on Image Processing Method based on AI Large Models

Hanwen Liu; Jordon Li; Qipei Tan; Jin Xiao; Hao-Hsiang Hsu; Andy Zhu; Anze Zhuge; Xingjian Zhang

doi:10.54691/mh0tqs13

Authors

Hanwen Liu
Jordon Li
Qipei Tan
Jin Xiao
Hao-Hsiang Hsu
Andy Zhu
Anze Zhuge
Xingjian Zhang

DOI:

https://doi.org/10.54691/mh0tqs13

Keywords:

AI, AIGC image processing, machine learning.

Abstract

The application of AI large models in image processing technology is continuously expanding and deepening. They automatically extract feature information from raw image data through deep learning technology and perform efficient analysis and processing. This article provides a review of the current state of image processing technology, focusing on the analysis of image processing techniques based on machine learning and AI large models. It is found that the introduction of AI large models has led to more rapid and intelligent development of image processing technology.

Downloads

Download data is not yet available.

References

[1] OUYANG W, ZENG X, WANG X, et al. DeepID-Net: object detection with deformable part based convolutional neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(7): 1320-1334.

[2] DIBA A, SHARMA V, PAZANDEH A, et al. Weakly supervised cascaded convolutional networks[C]//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. New York: ACM Press, 2017: 5131-5139.

[3] HU G, YANG Y X, YI D, et al. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition[C]//International Conference on Computer Vision, December 11-18, 2015, Santiago, Chile. Piscataway: IEEE Press, 2015: 142-150.

[4] LAWRENCE S, GILES C L, TSOI A C, et al. Face recognition: a convolutional neural-network approach[J]. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113.

[5] CAO Z, SIMON T, WEI S, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]//IEEE Conference on Computer Vision and Pattern Recognition, July 21-26, 2017, Honolulu, HI, USA. EprintArxiv, 2017: 1302-1310.

[6] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition, June 23-28, 2014, Columbus, OH, USA. New York: ACM Press, 2014: 1653-1660.

[7] PERREAULT S, HEBERT P. Median filtering in constant time[J]. IEEE Transactions on Image Processing, 2007, 16(9): 2389-2394.

[8] SLOT K, KOWALSKI J, NAPIERALSKI A, et al. Analogue median/average image filter based on cellular neural network paradigm[J]. Electronics Letters, 1999, 35(19): 1619-1620.

[9] DIREKOGLU C, NIXON M S. Image-based multiscale shape description using Gaussian filter[C]//2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, December 16-19, 2008, Bhubaneswar, India. Piscataway: IEEE Press, 2009: 673-678.

[10] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. arXiv:1406.2661, 2014.

[11] Gonzalez, R. C., & Woods, R. E. (2008). Digital Image Processing (3rd ed.). Pearson.

[12] Sobel, I. (1970). "An Isotropic 3x3 Gradient Operator for Image Processing." Stanford University.

[13] Pizer, S. M. et al. (1987). "Adaptive histogram equalization and its variations." Computer Vision, Graphics, and Image Processing.

[14] Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer.

[15] Lowe, D. G. (2004). "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision.

[16] Dalal, N., & Triggs, B. (2005). "Histograms of Oriented Gradients for Human Detection." IEEE Conference on Computer Vision and Pattern Recognition.

[17] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "ImageNet classification with deep convolutional neural networks." Advances in Neural Information Processing Systems.

[18] Simonyan, K., & Zisserman, A. (2014). "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv preprint arXiv:1409.1556.

[19] He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep Residual Learning for Image Recognition." IEEE Conference on Computer Vision and Pattern Recognition.

[20] Dosovitskiy, A. et al. (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations (ICLR).

[21] Goodfellow, I. et al. (2014). "Generative Adversarial Nets." Advances in Neural Information Processing Systems.

[22] Esteva, A. et al. (2017). "Dermatologist-level classification of skin cancer with deep neural networks." Nature.

[23] Chen, L. C. et al. (2018). "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs." IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Mirza, M., & Osindero, S. (2014). "Conditional Generative Adversarial Nets." arXiv preprint arXiv:1411.1784.

[25] Pathak, D. et al. (2016). "Context Encoders: Feature Learning by Inpainting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). "Image Style Transfer Using Convolutional Neural Networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ledig, C. et al. (2017). "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Dosovitskiy, A. et al. (2021). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations (ICLR).

[29] Ho, J., Jain, A., & Abbeel, P. (2020). "Denoising Diffusion Probabilistic Models." Advances in Neural Information Processing Systems.

[30] Ramesh, A. et al. (2021). "Zero-Shot Text-to-Image Generation." International Conference on Machine Learning (ICML).

[31] Romero, J. (2022). "Stable Diffusion: A Step Towards High-Fidelity and Versatile Image Generation." arXiv preprint arXiv:2204.06125.

[32] Belongie, S., Malik, J., & Puzicha, J. (2002). "Shape matching and object recognition using shape contexts." IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Szegedy, C., Liu, W., Jia, Y., et al. (2015). "Going deeper with convolutions." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). "Densely Connected Convolutional Networks." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Dai, J., Li, Y., He, K., & Sun, J. (2016). "R-FCN: Object Detection via Region-based Fully Convolutional Networks." Advances in Neural Information Processing Systems (NIPS).

[36] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). "Mask R-CNN." IEEE International Conference on Computer Vision (ICCV).

[37] Ronneberger, O., Fischer, P., & Brox, T. (2015). "U-Net: Convolutional Networks for Biomedical Image Segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI).

[38] Zhao, H., Shi, J., Qi, X., et al. (2017). "Pyramid Scene Parsing Network." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Ledig, C., Theis, L., Huszár, F., et al. (2017). "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Liu, Z., Lin, Y., Cao, Y., et al. (2021). "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows." IEEE International Conference on Computer Vision (ICCV).

[41] Zheng, S., Lu, J., Zhao, H., et al. (2021). "Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Li, J., Selvaraju, R. R., Gotmare, A. D., et al. (2022). "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." arXiv preprint arXiv:2201.12086.

[43] Chen, X., Ma, H., Wan, J., et al. (2017). "Multi-view 3D object detection network for autonomous driving." IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Esteva, A., Kuprel, B., Novoa, R. A., et al. (2017). "Dermatologist-level classification of skin cancer with deep neural networks." Nature.