Update model card: add GitHub link, design docs, and benchmark setup guide
Browse files
README.md
CHANGED
|
@@ -173,7 +173,7 @@ Tables in database 'concert_singer':
|
|
| 173 |
|
| 174 |
### Inference
|
| 175 |
|
| 176 |
-
For inference with the correct chat template, see the [evaluation server code](https://github.com/nittygritty-zzy/sqlglot/tree/main/evaluation/server) on GitHub.
|
| 177 |
|
| 178 |
## Reproducing the Benchmark
|
| 179 |
|
|
@@ -243,20 +243,20 @@ ls data/spider/database/ | wc -l # ~166 databases (20 used by dev set)
|
|
| 243 |
pip install huggingface_hub
|
| 244 |
python -c "
|
| 245 |
from huggingface_hub import snapshot_download
|
| 246 |
-
snapshot_download('nittygritty-zzy/pipe-sql-1.5b', local_dir='finetuning_output/merged')
|
| 247 |
"
|
| 248 |
|
| 249 |
# Option B: Use git-lfs
|
| 250 |
git lfs install
|
| 251 |
-
git clone https://huggingface.co/nittygritty-zzy/pipe-sql-1.5b finetuning_output/merged
|
| 252 |
```
|
| 253 |
|
| 254 |
### Step 5: Install Node.js Agent Dependencies
|
| 255 |
|
| 256 |
```bash
|
| 257 |
-
cd evaluation/agent
|
| 258 |
npm install
|
| 259 |
-
cd ../..
|
| 260 |
```
|
| 261 |
|
| 262 |
### Step 6: Run the Benchmark
|
|
@@ -265,10 +265,10 @@ cd ../..
|
|
| 265 |
|
| 266 |
```bash
|
| 267 |
# Run all 1,034 questions (takes ~2 hours on RTX 4080)
|
| 268 |
-
bash evaluation/run_all.sh
|
| 269 |
|
| 270 |
# Smoke test with 5 questions first
|
| 271 |
-
bash evaluation/run_all.sh --limit 5
|
| 272 |
```
|
| 273 |
|
| 274 |
This script:
|
|
@@ -281,36 +281,36 @@ This script:
|
|
| 281 |
|
| 282 |
**Start the evaluation server:**
|
| 283 |
```bash
|
| 284 |
-
# Default: loads model from finetuning_output/merged/
|
| 285 |
-
python -m evaluation.server.app
|
| 286 |
|
| 287 |
# Custom model path:
|
| 288 |
-
MODEL_PATH=path/to/model python -m evaluation.server.app
|
| 289 |
```
|
| 290 |
|
| 291 |
Wait for `Server ready` in the logs, then in a separate terminal:
|
| 292 |
|
| 293 |
**Run the agent benchmark:**
|
| 294 |
```bash
|
| 295 |
-
cd evaluation/agent
|
| 296 |
npx tsx src/main.ts --benchmark # All 1,034 questions
|
| 297 |
npx tsx src/main.ts --benchmark --limit 5 # Smoke test
|
| 298 |
```
|
| 299 |
|
| 300 |
**Run single question interactively:**
|
| 301 |
```bash
|
| 302 |
-
cd evaluation/agent
|
| 303 |
npx tsx src/main.ts "How many singers do we have?" concert_singer
|
| 304 |
```
|
| 305 |
|
| 306 |
**Evaluate results:**
|
| 307 |
```bash
|
| 308 |
-
python evaluation/evaluate.py --results
|
| 309 |
```
|
| 310 |
|
| 311 |
### Step 7: Review Results
|
| 312 |
|
| 313 |
-
Results are saved to `
|
| 314 |
|
| 315 |
| File | Description |
|
| 316 |
|------|-------------|
|
|
@@ -322,16 +322,16 @@ Results are saved to `evaluation_output/`:
|
|
| 322 |
|
| 323 |
| Environment Variable | Default | Description |
|
| 324 |
|---------------------|---------|-------------|
|
| 325 |
-
| `MODEL_PATH` | `finetuning_output/merged` | Path to merged model directory |
|
| 326 |
| `SPIDER_DB_DIR` | `data/spider/database` | Spider database directory |
|
| 327 |
| `SPIDER_DIR` | `data/spider` | Spider data directory (contains dev.json) |
|
| 328 |
| `PORT` | `8000` | Evaluation server port |
|
| 329 |
| `SERVER_URL` | `http://localhost:8000` | Agent to server connection URL |
|
| 330 |
-
| `OUTPUT_DIR` | `
|
| 331 |
|
| 332 |
### Troubleshooting
|
| 333 |
|
| 334 |
-
**Server fails to load model**: Ensure `finetuning_output/merged/` contains `config.json`, `model.safetensors`, and `tokenizer.json`. If using a different path, set `MODEL_PATH`.
|
| 335 |
|
| 336 |
**CUDA out of memory**: The 1.5B model needs ~3 GB VRAM in float16. Close other GPU processes or use `CUDA_VISIBLE_DEVICES=0` to select a specific GPU.
|
| 337 |
|
|
|
|
| 173 |
|
| 174 |
### Inference
|
| 175 |
|
| 176 |
+
For inference with the correct chat template, see the [evaluation server code](https://github.com/nittygritty-zzy/sqlglot/tree/main/pipe_sql/evaluation/server) on GitHub.
|
| 177 |
|
| 178 |
## Reproducing the Benchmark
|
| 179 |
|
|
|
|
| 243 |
pip install huggingface_hub
|
| 244 |
python -c "
|
| 245 |
from huggingface_hub import snapshot_download
|
| 246 |
+
snapshot_download('nittygritty-zzy/pipe-sql-1.5b', local_dir='pipe_sql/finetuning_output/merged')
|
| 247 |
"
|
| 248 |
|
| 249 |
# Option B: Use git-lfs
|
| 250 |
git lfs install
|
| 251 |
+
git clone https://huggingface.co/nittygritty-zzy/pipe-sql-1.5b pipe_sql/finetuning_output/merged
|
| 252 |
```
|
| 253 |
|
| 254 |
### Step 5: Install Node.js Agent Dependencies
|
| 255 |
|
| 256 |
```bash
|
| 257 |
+
cd pipe_sql/evaluation/agent
|
| 258 |
npm install
|
| 259 |
+
cd ../../..
|
| 260 |
```
|
| 261 |
|
| 262 |
### Step 6: Run the Benchmark
|
|
|
|
| 265 |
|
| 266 |
```bash
|
| 267 |
# Run all 1,034 questions (takes ~2 hours on RTX 4080)
|
| 268 |
+
bash pipe_sql/evaluation/run_all.sh
|
| 269 |
|
| 270 |
# Smoke test with 5 questions first
|
| 271 |
+
bash pipe_sql/evaluation/run_all.sh --limit 5
|
| 272 |
```
|
| 273 |
|
| 274 |
This script:
|
|
|
|
| 281 |
|
| 282 |
**Start the evaluation server:**
|
| 283 |
```bash
|
| 284 |
+
# Default: loads model from pipe_sql/finetuning_output/merged/
|
| 285 |
+
python -m pipe_sql.evaluation.server.app
|
| 286 |
|
| 287 |
# Custom model path:
|
| 288 |
+
MODEL_PATH=path/to/model python -m pipe_sql.evaluation.server.app
|
| 289 |
```
|
| 290 |
|
| 291 |
Wait for `Server ready` in the logs, then in a separate terminal:
|
| 292 |
|
| 293 |
**Run the agent benchmark:**
|
| 294 |
```bash
|
| 295 |
+
cd pipe_sql/evaluation/agent
|
| 296 |
npx tsx src/main.ts --benchmark # All 1,034 questions
|
| 297 |
npx tsx src/main.ts --benchmark --limit 5 # Smoke test
|
| 298 |
```
|
| 299 |
|
| 300 |
**Run single question interactively:**
|
| 301 |
```bash
|
| 302 |
+
cd pipe_sql/evaluation/agent
|
| 303 |
npx tsx src/main.ts "How many singers do we have?" concert_singer
|
| 304 |
```
|
| 305 |
|
| 306 |
**Evaluate results:**
|
| 307 |
```bash
|
| 308 |
+
python pipe_sql/evaluation/evaluate.py --results pipe_sql/output/results.json
|
| 309 |
```
|
| 310 |
|
| 311 |
### Step 7: Review Results
|
| 312 |
|
| 313 |
+
Results are saved to `pipe_sql/output/`:
|
| 314 |
|
| 315 |
| File | Description |
|
| 316 |
|------|-------------|
|
|
|
|
| 322 |
|
| 323 |
| Environment Variable | Default | Description |
|
| 324 |
|---------------------|---------|-------------|
|
| 325 |
+
| `MODEL_PATH` | `pipe_sql/finetuning_output/merged` | Path to merged model directory |
|
| 326 |
| `SPIDER_DB_DIR` | `data/spider/database` | Spider database directory |
|
| 327 |
| `SPIDER_DIR` | `data/spider` | Spider data directory (contains dev.json) |
|
| 328 |
| `PORT` | `8000` | Evaluation server port |
|
| 329 |
| `SERVER_URL` | `http://localhost:8000` | Agent to server connection URL |
|
| 330 |
+
| `OUTPUT_DIR` | `pipe_sql/output` | Agent output directory |
|
| 331 |
|
| 332 |
### Troubleshooting
|
| 333 |
|
| 334 |
+
**Server fails to load model**: Ensure `pipe_sql/finetuning_output/merged/` contains `config.json`, `model.safetensors`, and `tokenizer.json`. If using a different path, set `MODEL_PATH`.
|
| 335 |
|
| 336 |
**CUDA out of memory**: The 1.5B model needs ~3 GB VRAM in float16. Close other GPU processes or use `CUDA_VISIBLE_DEVICES=0` to select a specific GPU.
|
| 337 |
|