Syntax and Relation Enhanced Query Generation for Text to SQL Parsing

Authors

  • Kun Han
  • Xi Xiong

DOI:

https://doi.org/10.54691/ww5m2092

Keywords:

Text-to-SQL; Syntactic Dependencies; Transformer; Abstract Syntax Trees.

Abstract

Text-to-SQL parsing is the task of converting natural language questions into executable SQL queries, a significant branch of semantic parsing, which has gained increasing attention in recent years. This technology lowers the barrier for people to access databases, enhancing the convenience and availability of data. However, the primary challenge for text-to-SQL parsing lies in domain adaptation, which concerns whether the model can be applied to new databases and effectively align natural language questions with the corresponding tables or columns within the database. To address these issues, research has introduced SRSQL (Syntax and Relation-Augmented Query Generation), which incorporates syntax information and predefined relationships into the model, effectively utilizing syntactic dependencies and pattern linking to improve performance. Using a Transformer-based decoder, SRSQL generates SQL queries in the form of Abstract Syntax Trees (AST), significantly boosting prediction accuracy. Experimental results show that SRSQL outperforms comparative models, particularly on challenging benchmarks like Spider and Spider-SYN.

Downloads

Download data is not yet available.

References

John M. Zelle and Raymond J. Mooney: Learning to parse database queries using inductive logic programming. Proceedings of the thirteenth national conference on Artificial intelligence (Portland, Oregon, 1996). Vol.2, p1050–1055.

Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev: Improving Text-to-SQL Evaluation Methodology. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Melbourne, Australia, July ,2018). Vol.1, p351–360.

Victor Zhong, Caiming Xiong, and Richard Socher: Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.arXiv:1709.00103, 2017.

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev: Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Brussels, Belgium, October-November ,2018), p3911–3921.

Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Bowen Li, Jian Sun, and Yongbin Li: "S" ^"2" SQL: Injecting syntax to question-schema interaction graph encoder for text-to-sql parsers.2022. arXiv:2203.06958.

Wonseok Hwang, Jinyeong Yim, Seunghyun Park, and Minjoon Seo: A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. arXiv:1902.01069, 2019.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Minneapolis, Minnesota, June,2019).

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov: RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692, 2019.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin: Attention Is All You Need. arXiv:1706.03762, 2017.

Xiaojun Xu, Chang Liu, and Dawn Song: SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning.arXiv:1711.04436, 2017.

Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, and Dragomir Radev: TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation.ArXiv:1804.09769, 2018.

DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin: RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases.ArXiv: 2004.03125,2020.

Pengcheng He, Yi Mao, Kaushik Chakrabarti, and Weizhu Chen: X-SQL: reinforce schema representation with context. ArXiv:1908.08113, 2019.

Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao,Jian-Guang Lou, Ting Liu, and Dongmei Zhang: Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. ArXiv:1905. 08205, 2019.

Xi Victoria Lin, Richard Socher, and Caiming Xiong: Bridging textual and tabular data for crossdomain text-to-SQL semantic parsing. Findings of the Association for Computational Linguistics: EMNLP 2020(Online, November,2020), p4870–4888.

Ben Bogin, Matt Gardner, and Jonathan Berant: Global reasoning over database structures for text-to-SQL parsing. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (Hong Kong, China, November,2019), p3657–3662.

Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, and Kai Yu: Shadowgnn: Graph projection neural network for text-to-sql parser. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Online, June,2021), p 5567–5577.

Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao: SADGA: Structure-aware dual graph aggregation network for text-to-sql. Advances in Neural Information Processing Systems (2021). Vol.34, p7664–7676.

Binyuan Hui, Xiang Shi, Ruiying Geng, Binhua Li, Yongbin Li, Jian Sun, Xiaodan Zhu: Improving Text-to-SQL with Schema Dependency Learning. arXiv:2103.04399,2021.

Peter Shaw, Ming-Wei Chang, Panupong Pasupat, and Kristina Toutanova: Compositional generalization and natural language variation: Can a semantic parsing approach handle both? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Online, August,2021). Vol.1, p922–938, Online.

Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau: PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (Online and Punta Cana, Dominican Republic.November,2021), p9895–9901.

Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson: RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Online, July,2020), p7567-7578.

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani: Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (New Orleans, Louisiana, June,2018). Vol.2, p464–468.

Petar Velickovi ˇ c, Guillem Cucurull, Arantxa ´Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio: Graph Attention Networks.arXiv:1710.10903,2018.

Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R. Woodward, Jinxia Xie, and Pengsheng Huang: Towards robustness of text-to-SQL models against synonym substitution. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Online, August,2021). Vol.1,p2505-2515.

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer: Learning a neural semantic parser from user feedback. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vancouver, Canada, July,2017). Vol .1, p963–973.

John M. Zelle and Raymond J. Mooney: Learning to parse database queries using inductive logic programming. Proceedings of the Thirteenth National Conference on Artificial Intelligence (1996). Vol. 2, p1050–1055.

Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning: Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (Online, July,2020), p101–108.

Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Bowen Li, Jian Sun, and Yongbin Li: "S" ^"2" SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers.arXiv: 2203.06958,2022.

Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, and Kai Yu.: LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations. ArXiv:2106.01093,2021.

Jiexing Qi, Jingyao Tang, Ziwei He, Xiangpeng Wan, Yu Cheng, Chenghu Zhou, Xinbing Wang, Quanshi Zhang, and Zhouhan Lin: RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL. arXiv:2205.06983,2022.

Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, and Yongbin Li: Graphix-t5: Mixing pretrained transformers with graph-aware layers for text-to-sql parsing. arXiv:2301.07507,2023.

Downloads

Published

2024-04-30

Issue

Section

Articles

How to Cite

Han, K., & Xiong, X. (2024). Syntax and Relation Enhanced Query Generation for Text to SQL Parsing. Frontiers in Science and Engineering, 4(4), 82-93. https://doi.org/10.54691/ww5m2092