NeurIPS11092 / README.md
Anonymous Author
update README
4091ce3

NeurIPS11092: Text-to-CadQuery Repo

This repository contains all resources used to train and evaluate large language models for generating CadQuery code from natural language descriptions.

Contents

  • data/
    Contains prompt-completion pairs used to finetune six open-source LLMs.
    These files are split into data_train.jsonl, data_val.jsonl, and data_test.jsonl following a 90/5/5 ratio.

  • CadQuery.zip
    Includes all 170,000 CadQuery programs we generated from the Text2CAD dataset using Gemini 2.0 Flash.

  • text2cad_v1.1.csv
    Original source data provided by the Text2CAD authors, in minimal JSON format.

Finetuned Models

We trained the following models on this dataset:

Acknowledgements

We gratefully acknowledge the authors of Text2CAD and DeepCAD for their foundational datasets and inspiration.