A Combined Approach of Program Analysis and Deep Learning for Code Completion

Yi Liu

doi:10.54691/hkyc3a89

Authors

Yi Liu

DOI:

https://doi.org/10.54691/hkyc3a89

Keywords:

Code Completion; Program Analysis; Graph Gated Neural Networks; Transformer.

Abstract

Code completion, a critical feature in integrated development environments, significantly reduces the coding workload for developers. Traditional code completion techniques often focus on the natural language properties of code, overlooking the structural characteristics of programming languages. To address this issue, this paper introduces a novel approach combining program analysis and deep learning to enhance the accuracy and efficiency of code completion. Initially, the research utilizes program analysis techniques to explore the structural and semantic information of code snippets deeply, constructing a program graph. Graph Gated Neural Networks (GGNN) and Transformer technologies are then employed to represent the program graph, capturing the local features and long-range dependencies of the code. The model integrates both the semantic and structural information of code, thereby providing more accurate completion suggestions. Experimental evaluations were conducted on public datasets for Python and JavaScript, two extensively used programming languages. The results demonstrate that our method significantly outperforms existing approaches in terms of code completion accuracy and Mean Reciprocal Rank (MRR).

Downloads

Download data is not yet available.

References

Hindle A, Barr E T, Gabel M, et al. On the naturalness of software[J]. Communications of the ACM, 2016, 59(5): 122-131.

X. Jin and F. Servant. The hidden cost of code completion: Understanding the impact of the recommendation-list length on its efficiency. in Proceedings of the 15th International Conference on Mining Software Repositories, pp. 70-73, 2018.

M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems, in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp. 213-222, 2009.

Z. Tu, Z. Su, and P. Devanbu. On the localness of software, in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 269-280, 2014.

F. Liu, G. Li, B. Wei, X. Xia, Z. Fu, and Z. Jin. A unified multi-task learning model for ast-level and token-level code completion, Empirical Software Engineering, vol. 27, no. 4, p. 91, 2022.

V. Raychev, P. Bielik, and M. Vechev. Probabilistic model for code with decision trees, ACM SIGPLAN Notices, vol. 51, no. 10, pp. 731-747, 2016.