| # NeurIPS11092: Text-to-CadQuery Repo | |
| This repository contains all resources used to train and evaluate large language models for generating CadQuery code from natural language descriptions. | |
| ## Contents | |
| - `data/` | |
| Contains prompt-completion pairs used to finetune six open-source LLMs. | |
| These files are split into `data_train.jsonl`, `data_val.jsonl`, and `data_test.jsonl` following a 90/5/5 ratio. | |
| - `CadQuery.zip` | |
| Includes all **170,000 CadQuery programs** we generated from the [Text2CAD](https://github.com/SadilKhan/Text2CAD) dataset using Gemini 2.0 Flash. | |
| - `text2cad_v1.1.csv` | |
| Original source data provided by the Text2CAD authors, in minimal JSON format. | |
| ## Finetuned Models | |
| We trained the following models on this dataset: | |
| - [CodeGPT-small](https://huggingface.co/ricemonster/codegpt-small-sft) | |
| - [GPT-2 Medium](https://huggingface.co/ricemonster/gpt2-medium-sft) | |
| - [GPT-2 Large](https://huggingface.co/ricemonster/gpt2-large-sft) | |
| - [Gemma-1B](https://huggingface.co/ricemonster/gemma-1B-SFT) | |
| - [Qwen2.5-3B](https://huggingface.co/ricemonster/qwen2.5-3B-SFT) | |
| - [Mistral-7B (LoRA)](https://huggingface.co/ricemonster/Mistral-7B-lora) | |
| ## Acknowledgements | |
| We gratefully acknowledge the authors of [Text2CAD](https://github.com/SadilKhan/Text2CAD) and [DeepCAD](https://github.com/ChrisWu1997/DeepCAD) for their foundational datasets and inspiration. | |