Anonymous Author commited on
Commit ·
4091ce3
1
Parent(s): 3ade5c4
update README
Browse files
README.md
CHANGED
|
@@ -1,3 +1,30 @@
|
|
| 1 |
-
--
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# NeurIPS11092: Text-to-CadQuery Repo
|
| 2 |
+
|
| 3 |
+
This repository contains all resources used to train and evaluate large language models for generating CadQuery code from natural language descriptions.
|
| 4 |
+
|
| 5 |
+
## Contents
|
| 6 |
+
|
| 7 |
+
- `data/`
|
| 8 |
+
Contains prompt-completion pairs used to finetune six open-source LLMs.
|
| 9 |
+
These files are split into `data_train.jsonl`, `data_val.jsonl`, and `data_test.jsonl` following a 90/5/5 ratio.
|
| 10 |
+
|
| 11 |
+
- `CadQuery.zip`
|
| 12 |
+
Includes all **170,000 CadQuery programs** we generated from the [Text2CAD](https://github.com/SadilKhan/Text2CAD) dataset using Gemini 2.0 Flash.
|
| 13 |
+
|
| 14 |
+
- `text2cad_v1.1.csv`
|
| 15 |
+
Original source data provided by the Text2CAD authors, in minimal JSON format.
|
| 16 |
+
|
| 17 |
+
## Finetuned Models
|
| 18 |
+
|
| 19 |
+
We trained the following models on this dataset:
|
| 20 |
+
|
| 21 |
+
- [CodeGPT-small](https://huggingface.co/ricemonster/codegpt-small-sft)
|
| 22 |
+
- [GPT-2 Medium](https://huggingface.co/ricemonster/gpt2-medium-sft)
|
| 23 |
+
- [GPT-2 Large](https://huggingface.co/ricemonster/gpt2-large-sft)
|
| 24 |
+
- [Gemma-1B](https://huggingface.co/ricemonster/gemma-1B-SFT)
|
| 25 |
+
- [Qwen2.5-3B](https://huggingface.co/ricemonster/qwen2.5-3B-SFT)
|
| 26 |
+
- [Mistral-7B (LoRA)](https://huggingface.co/ricemonster/Mistral-7B-lora)
|
| 27 |
+
|
| 28 |
+
## Acknowledgements
|
| 29 |
+
|
| 30 |
+
We gratefully acknowledge the authors of [Text2CAD](https://github.com/SadilKhan/Text2CAD) and [DeepCAD](https://github.com/ChrisWu1997/DeepCAD) for their foundational datasets and inspiration.
|