nittygritty-zzy
/

pipe-sql-1.5b

@@ -111,6 +111,20 @@ Evaluated on the **Spider 1.0 dev set** (1,034 questions) using an agentic bench
 | **Total Questions** | 1,034 |
 | **Gold Errors Excluded** | 2 |
 ### Detailed Breakdown
 | Status | Count | % of Total | Description |

 | **Total Questions** | 1,034 |
 | **Gold Errors Excluded** | 2 |
+### Context: Spider 1.0 Dev Set SOTA
+| Model | Size | EX (Dev) | Method |
+|-------|-----:|----------|--------|
+| MiniSeek | — | 91.2% | Proprietary |
+| DAIL-SQL + GPT-4 + SC | — | 86.6% | In-context learning |
+| DIN-SQL + GPT-4 | — | 85.3% | In-context learning |
+| SFT CodeS-7B | 7B | 85.4% | Fine-tuned |
+| SFT CodeS-3B | 3B | 83.3% | Fine-tuned |
+| SFT CodeS-1B | 1B | 77.9% | Fine-tuned |
+| **Pipe SQL 1.5B (ours)** | **1.5B** | **60.7%** | **Fine-tuned, agentic tool-calling** |
+Our model trails CodeS-1B by ~17 points. Key differences: (1) Pipe SQL generates a novel SQL dialect (pipe syntax) rather than standard SQL, adding a transpilation step; (2) the agentic tool-calling interface adds overhead vs. direct SQL generation; (3) our focus is on demonstrating the pipe SQL paradigm, not maximizing Spider accuracy. Sources: [Spider leaderboard](https://yale-lily.github.io/spider), [CodeS (Li et al., 2024)](https://arxiv.org/abs/2402.16347).
 ### Detailed Breakdown
 | Status | Count | % of Total | Description |