nittygritty-zzy commited on
Commit
977cbcc
·
verified ·
1 Parent(s): a391fc6

Update model card: add GitHub link, design docs, and benchmark setup guide

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -173,7 +173,7 @@ Tables in database 'concert_singer':
173
 
174
  ### Inference
175
 
176
- For inference with the correct chat template, see the [evaluation server code](https://github.com/nittygritty-zzy/sqlglot/tree/main/evaluation/server) on GitHub.
177
 
178
  ## Reproducing the Benchmark
179
 
@@ -243,20 +243,20 @@ ls data/spider/database/ | wc -l # ~166 databases (20 used by dev set)
243
  pip install huggingface_hub
244
  python -c "
245
  from huggingface_hub import snapshot_download
246
- snapshot_download('nittygritty-zzy/pipe-sql-1.5b', local_dir='finetuning_output/merged')
247
  "
248
 
249
  # Option B: Use git-lfs
250
  git lfs install
251
- git clone https://huggingface.co/nittygritty-zzy/pipe-sql-1.5b finetuning_output/merged
252
  ```
253
 
254
  ### Step 5: Install Node.js Agent Dependencies
255
 
256
  ```bash
257
- cd evaluation/agent
258
  npm install
259
- cd ../..
260
  ```
261
 
262
  ### Step 6: Run the Benchmark
@@ -265,10 +265,10 @@ cd ../..
265
 
266
  ```bash
267
  # Run all 1,034 questions (takes ~2 hours on RTX 4080)
268
- bash evaluation/run_all.sh
269
 
270
  # Smoke test with 5 questions first
271
- bash evaluation/run_all.sh --limit 5
272
  ```
273
 
274
  This script:
@@ -281,36 +281,36 @@ This script:
281
 
282
  **Start the evaluation server:**
283
  ```bash
284
- # Default: loads model from finetuning_output/merged/
285
- python -m evaluation.server.app
286
 
287
  # Custom model path:
288
- MODEL_PATH=path/to/model python -m evaluation.server.app
289
  ```
290
 
291
  Wait for `Server ready` in the logs, then in a separate terminal:
292
 
293
  **Run the agent benchmark:**
294
  ```bash
295
- cd evaluation/agent
296
  npx tsx src/main.ts --benchmark # All 1,034 questions
297
  npx tsx src/main.ts --benchmark --limit 5 # Smoke test
298
  ```
299
 
300
  **Run single question interactively:**
301
  ```bash
302
- cd evaluation/agent
303
  npx tsx src/main.ts "How many singers do we have?" concert_singer
304
  ```
305
 
306
  **Evaluate results:**
307
  ```bash
308
- python evaluation/evaluate.py --results evaluation_output/results.json
309
  ```
310
 
311
  ### Step 7: Review Results
312
 
313
- Results are saved to `evaluation_output/`:
314
 
315
  | File | Description |
316
  |------|-------------|
@@ -322,16 +322,16 @@ Results are saved to `evaluation_output/`:
322
 
323
  | Environment Variable | Default | Description |
324
  |---------------------|---------|-------------|
325
- | `MODEL_PATH` | `finetuning_output/merged` | Path to merged model directory |
326
  | `SPIDER_DB_DIR` | `data/spider/database` | Spider database directory |
327
  | `SPIDER_DIR` | `data/spider` | Spider data directory (contains dev.json) |
328
  | `PORT` | `8000` | Evaluation server port |
329
  | `SERVER_URL` | `http://localhost:8000` | Agent to server connection URL |
330
- | `OUTPUT_DIR` | `evaluation_output` | Agent output directory |
331
 
332
  ### Troubleshooting
333
 
334
- **Server fails to load model**: Ensure `finetuning_output/merged/` contains `config.json`, `model.safetensors`, and `tokenizer.json`. If using a different path, set `MODEL_PATH`.
335
 
336
  **CUDA out of memory**: The 1.5B model needs ~3 GB VRAM in float16. Close other GPU processes or use `CUDA_VISIBLE_DEVICES=0` to select a specific GPU.
337
 
 
173
 
174
  ### Inference
175
 
176
+ For inference with the correct chat template, see the [evaluation server code](https://github.com/nittygritty-zzy/sqlglot/tree/main/pipe_sql/evaluation/server) on GitHub.
177
 
178
  ## Reproducing the Benchmark
179
 
 
243
  pip install huggingface_hub
244
  python -c "
245
  from huggingface_hub import snapshot_download
246
+ snapshot_download('nittygritty-zzy/pipe-sql-1.5b', local_dir='pipe_sql/finetuning_output/merged')
247
  "
248
 
249
  # Option B: Use git-lfs
250
  git lfs install
251
+ git clone https://huggingface.co/nittygritty-zzy/pipe-sql-1.5b pipe_sql/finetuning_output/merged
252
  ```
253
 
254
  ### Step 5: Install Node.js Agent Dependencies
255
 
256
  ```bash
257
+ cd pipe_sql/evaluation/agent
258
  npm install
259
+ cd ../../..
260
  ```
261
 
262
  ### Step 6: Run the Benchmark
 
265
 
266
  ```bash
267
  # Run all 1,034 questions (takes ~2 hours on RTX 4080)
268
+ bash pipe_sql/evaluation/run_all.sh
269
 
270
  # Smoke test with 5 questions first
271
+ bash pipe_sql/evaluation/run_all.sh --limit 5
272
  ```
273
 
274
  This script:
 
281
 
282
  **Start the evaluation server:**
283
  ```bash
284
+ # Default: loads model from pipe_sql/finetuning_output/merged/
285
+ python -m pipe_sql.evaluation.server.app
286
 
287
  # Custom model path:
288
+ MODEL_PATH=path/to/model python -m pipe_sql.evaluation.server.app
289
  ```
290
 
291
  Wait for `Server ready` in the logs, then in a separate terminal:
292
 
293
  **Run the agent benchmark:**
294
  ```bash
295
+ cd pipe_sql/evaluation/agent
296
  npx tsx src/main.ts --benchmark # All 1,034 questions
297
  npx tsx src/main.ts --benchmark --limit 5 # Smoke test
298
  ```
299
 
300
  **Run single question interactively:**
301
  ```bash
302
+ cd pipe_sql/evaluation/agent
303
  npx tsx src/main.ts "How many singers do we have?" concert_singer
304
  ```
305
 
306
  **Evaluate results:**
307
  ```bash
308
+ python pipe_sql/evaluation/evaluate.py --results pipe_sql/output/results.json
309
  ```
310
 
311
  ### Step 7: Review Results
312
 
313
+ Results are saved to `pipe_sql/output/`:
314
 
315
  | File | Description |
316
  |------|-------------|
 
322
 
323
  | Environment Variable | Default | Description |
324
  |---------------------|---------|-------------|
325
+ | `MODEL_PATH` | `pipe_sql/finetuning_output/merged` | Path to merged model directory |
326
  | `SPIDER_DB_DIR` | `data/spider/database` | Spider database directory |
327
  | `SPIDER_DIR` | `data/spider` | Spider data directory (contains dev.json) |
328
  | `PORT` | `8000` | Evaluation server port |
329
  | `SERVER_URL` | `http://localhost:8000` | Agent to server connection URL |
330
+ | `OUTPUT_DIR` | `pipe_sql/output` | Agent output directory |
331
 
332
  ### Troubleshooting
333
 
334
+ **Server fails to load model**: Ensure `pipe_sql/finetuning_output/merged/` contains `config.json`, `model.safetensors`, and `tokenizer.json`. If using a different path, set `MODEL_PATH`.
335
 
336
  **CUDA out of memory**: The 1.5B model needs ~3 GB VRAM in float16. Close other GPU processes or use `CUDA_VISIBLE_DEVICES=0` to select a specific GPU.
337