Anonymous Author commited on
Commit
4091ce3
·
1 Parent(s): 3ade5c4

update README

Browse files
Files changed (1) hide show
  1. README.md +30 -3
README.md CHANGED
@@ -1,3 +1,30 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # NeurIPS11092: Text-to-CadQuery Repo
2
+
3
+ This repository contains all resources used to train and evaluate large language models for generating CadQuery code from natural language descriptions.
4
+
5
+ ## Contents
6
+
7
+ - `data/`
8
+ Contains prompt-completion pairs used to finetune six open-source LLMs.
9
+ These files are split into `data_train.jsonl`, `data_val.jsonl`, and `data_test.jsonl` following a 90/5/5 ratio.
10
+
11
+ - `CadQuery.zip`
12
+ Includes all **170,000 CadQuery programs** we generated from the [Text2CAD](https://github.com/SadilKhan/Text2CAD) dataset using Gemini 2.0 Flash.
13
+
14
+ - `text2cad_v1.1.csv`
15
+ Original source data provided by the Text2CAD authors, in minimal JSON format.
16
+
17
+ ## Finetuned Models
18
+
19
+ We trained the following models on this dataset:
20
+
21
+ - [CodeGPT-small](https://huggingface.co/ricemonster/codegpt-small-sft)
22
+ - [GPT-2 Medium](https://huggingface.co/ricemonster/gpt2-medium-sft)
23
+ - [GPT-2 Large](https://huggingface.co/ricemonster/gpt2-large-sft)
24
+ - [Gemma-1B](https://huggingface.co/ricemonster/gemma-1B-SFT)
25
+ - [Qwen2.5-3B](https://huggingface.co/ricemonster/qwen2.5-3B-SFT)
26
+ - [Mistral-7B (LoRA)](https://huggingface.co/ricemonster/Mistral-7B-lora)
27
+
28
+ ## Acknowledgements
29
+
30
+ We gratefully acknowledge the authors of [Text2CAD](https://github.com/SadilKhan/Text2CAD) and [DeepCAD](https://github.com/ChrisWu1997/DeepCAD) for their foundational datasets and inspiration.