nittygritty-zzy commited on
Commit
7e7485e
·
verified ·
1 Parent(s): 0e554cb

Update model card: add GitHub link, design docs, and benchmark setup guide

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -111,6 +111,20 @@ Evaluated on the **Spider 1.0 dev set** (1,034 questions) using an agentic bench
111
  | **Total Questions** | 1,034 |
112
  | **Gold Errors Excluded** | 2 |
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ### Detailed Breakdown
115
 
116
  | Status | Count | % of Total | Description |
 
111
  | **Total Questions** | 1,034 |
112
  | **Gold Errors Excluded** | 2 |
113
 
114
+ ### Context: Spider 1.0 Dev Set SOTA
115
+
116
+ | Model | Size | EX (Dev) | Method |
117
+ |-------|-----:|----------|--------|
118
+ | MiniSeek | — | 91.2% | Proprietary |
119
+ | DAIL-SQL + GPT-4 + SC | — | 86.6% | In-context learning |
120
+ | DIN-SQL + GPT-4 | — | 85.3% | In-context learning |
121
+ | SFT CodeS-7B | 7B | 85.4% | Fine-tuned |
122
+ | SFT CodeS-3B | 3B | 83.3% | Fine-tuned |
123
+ | SFT CodeS-1B | 1B | 77.9% | Fine-tuned |
124
+ | **Pipe SQL 1.5B (ours)** | **1.5B** | **60.7%** | **Fine-tuned, agentic tool-calling** |
125
+
126
+ Our model trails CodeS-1B by ~17 points. Key differences: (1) Pipe SQL generates a novel SQL dialect (pipe syntax) rather than standard SQL, adding a transpilation step; (2) the agentic tool-calling interface adds overhead vs. direct SQL generation; (3) our focus is on demonstrating the pipe SQL paradigm, not maximizing Spider accuracy. Sources: [Spider leaderboard](https://yale-lily.github.io/spider), [CodeS (Li et al., 2024)](https://arxiv.org/abs/2402.16347).
127
+
128
  ### Detailed Breakdown
129
 
130
  | Status | Count | % of Total | Description |