sky2 / benchmarks /ADRS /llm_sql /README.md
JustinTX's picture
Add files using upload-large-folder tool
b0e88cf verified
# LLM-SQL β€” Column Reordering for Prefix Caching
When rows of a table are serialized into LLM prompts sequentially, consecutive rows that share leading column values can reuse cached prefixes. This task evolves a column-reordering strategy that maximizes prefix-cache hit rates across multiple real-world datasets without altering the underlying data.
## Setup
1. **Download the datasets** (~69 MB total):
```bash
cd benchmarks/ADRS/llm_sql
bash download_dataset.sh
```
This downloads 5 CSV datasets into `datasets/`:
- `movies.csv` β€” Rotten Tomatoes movie reviews (~9 MB)
- `beer.csv` β€” Beer review dataset (~2.5 MB)
- `BIRD.csv` β€” BIRD text-to-SQL dataset (~34 MB)
- `PDMX.csv` β€” PDMX metadata dataset (~7.4 MB)
- `products.csv` β€” Amazon product catalog (~16 MB)
2. **Set your API key:**
```bash
export OPENAI_API_KEY=...
```
## Run
From the repo root:
```bash
uv run skydiscover-run \
benchmarks/ADRS/llm_sql/initial_program.py \
benchmarks/ADRS/llm_sql/evaluator.py \
-c benchmarks/ADRS/llm_sql/config.yaml \
-s [your_algorithm] \
-i 100
```
## Scoring
Combined score: `0.95 * average_hit_rate + 0.05 * (12 - min(12, avg_runtime)) / 12`
- **Hit rate** (95% weight): prefix-cache hit count normalized across 5 datasets
- **Runtime** (5% weight): wall-clock seconds for the reordering algorithm
## Files
| File | Description |
|------|-------------|
| `initial_program.py` | Baseline `Evolved` class with `reorder()` method to evolve |
| `evaluator.py` | Scores programs on prefix hit rate and runtime across 5 datasets |
| `config.yaml` | Task-specific config (LLM, evaluator timeout, system prompt) |
| `solver.py` | Base `Algorithm` class and greedy baseline |
| `utils.py` | Prefix hit count evaluation utilities |
| `download_dataset.sh` | Script to download required CSV datasets |