nittygritty-zzy
/

pipe-sql-1.5b

@@ -31,6 +31,8 @@ model-index:
 A fine-tuned [Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) model for generating **Pipe SQL** through multi-turn tool-calling conversations.
 ## What is Pipe SQL?
 Pipe SQL is a more readable SQL syntax that uses the `|>` (pipe) operator to chain operations in a linear, top-to-bottom flow:
@@ -56,6 +58,17 @@ This is transpiled to standard SQL via [sqlglot](https://github.com/tobymao/sqlg
 | **Attention Heads** | 12 (2 KV heads) |
 | **Context Length** | 2048 tokens (training) |
 ## Training
 The model was fine-tuned using **QLoRA** on multi-turn tool-calling conversations for text-to-SQL generation.
@@ -160,7 +173,7 @@ Tables in database 'concert_singer':
 ### Inference
-For inference with the correct chat template, see the evaluation server code in the [sqlglot repository](https://github.com/nittygritty-zzy/sqlglot/tree/main/evaluation/server).
 ## Reproducing the Benchmark

 A fine-tuned [Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) model for generating **Pipe SQL** through multi-turn tool-calling conversations.
+**GitHub**: [nittygritty-zzy/sqlglot](https://github.com/nittygritty-zzy/sqlglot)
 ## What is Pipe SQL?
 Pipe SQL is a more readable SQL syntax that uses the `|>` (pipe) operator to chain operations in a linear, top-to-bottom flow:
 | **Attention Heads** | 12 (2 KV heads) |
 | **Context Length** | 2048 tokens (training) |
+## Design Documents
+The full design and methodology behind this project is documented in the following design docs (also available in [docs/design/](https://github.com/nittygritty-zzy/sqlglot/tree/main/docs/design) on GitHub):
+| Document | Description |
+|----------|-------------|
+| [Fine-Tuning Design Doc](docs/pipe-sql-fine-tuning-design-doc.md) | End-to-end system design for incremental pipe SQL synthesis and specialized fine-tuning of 1.5B-7B models |
+| [Decompiler Design Doc](docs/pipe-sql-decompiler-design-doc.md) | Standard SQL to pipe SQL decompiler — the deterministic data generation component |
+| [Validation Loop Design Doc](docs/pipe-sql-validation-loop-design-doc.md) | SQLite round-trip validation and feedback loop to ensure semantic correctness |
+| [Training Reproduction Guide](docs/pipe-sql-training-reproduction-guide.md) | Step-by-step guide to reproduce the full training pipeline from scratch |
 ## Training
 The model was fine-tuned using **QLoRA** on multi-turn tool-calling conversations for text-to-SQL generation.
 ### Inference
+For inference with the correct chat template, see the [evaluation server code](https://github.com/nittygritty-zzy/sqlglot/tree/main/evaluation/server) on GitHub.
 ## Reproducing the Benchmark