| | --- |
| | license: mit |
| | --- |
| | |
| | # Fine-Tuned Google T5 Model for Text to SQL Translation |
| |
|
| | A fine-tuned version of the Google T5 model, specifically trained for the task of translating natural language queries into SQL statements. |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture**: Google T5 (Text-to-Text Transfer Transformer) |
| | - **Task**: Text to SQL Translation |
| | - **Fine-Tuning Datasets**: |
| | - [sql-create-context Dataset](https://huggingface.co/datasets/b-mc2/sql-create-context) |
| | - [Synthetic-Text-To-SQL Dataset](https://huggingface.co/datasets/gretelai/synthetic-text-to-sql) |
| |
|
| | ## Fine-Tuning Datasets |
| |
|
| | 1. **sql-create-context Dataset**: |
| | - This dataset was created by modifying data from the following sources: |
| | - Zhong, Victor, et al. "Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning." (2017). |
| | - Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task." (2018). |
| | - Citation: |
| | ```bibtex |
| | @misc{b-mc2_2023_sql-create-context, |
| | title = {sql-create-context Dataset}, |
| | author = {b-mc2}, |
| | year = {2023}, |
| | url = {https://huggingface.co/datasets/b-mc2/sql-create-context}, |
| | note = {This dataset was created by modifying data from the following sources: \cite{zhongSeq2SQL2017, yu2018spider}.}, |
| | } |
| | ``` |
| | |
| | 2. **Synthetic-Text-To-SQL Dataset**: |
| | - A synthetic dataset for training language models to generate SQL queries from natural language prompts. |
| | - Citation: |
| | ```bibtex |
| | @software{gretel-synthetic-text-to-sql-2024, |
| | author = {Meyer, Yev and Emadi, Marjan and Nathawani, Dhruv and Ramaswamy, Lipika and Boyd, Kendrick and Van Segbroeck, Maarten and Grossman, Matthew and Mlocek, Piotr and Newberry, Drew}, |
| | title = {{Synthetic-Text-To-SQL}: A synthetic dataset for training language models to generate SQL queries from natural language prompts}, |
| | month = {April}, |
| | year = {2024}, |
| | url = {https://huggingface.co/datasets/gretelai/synthetic-text-to-sql} |
| | } |
| | ``` |
| | |
| | ## Ongoing Work |
| |
|
| | I am currently working to implement PICARD (Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models) to improve the results of this model. More details can be found in the original PICARD paper: |
| |
|
| | - Citation: |
| | ```bibtex |
| | @misc{scholak2021picardparsingincrementallyconstrained, |
| | title={PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models}, |
| | author={Torsten Scholak and Nathan Schucher and Dzmitry Bahdanau}, |
| | year={2021}, |
| | eprint={2109.05093}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2109.05093}, |
| | } |
| | |