| # LLM-SQL β Column Reordering for Prefix Caching |
|
|
| When rows of a table are serialized into LLM prompts sequentially, consecutive rows that share leading column values can reuse cached prefixes. This task evolves a column-reordering strategy that maximizes prefix-cache hit rates across multiple real-world datasets without altering the underlying data. |
|
|
| ## Setup |
|
|
| 1. **Download the datasets** (~69 MB total): |
|
|
| ```bash |
| cd benchmarks/ADRS/llm_sql |
| bash download_dataset.sh |
| ``` |
|
|
| This downloads 5 CSV datasets into `datasets/`: |
| - `movies.csv` β Rotten Tomatoes movie reviews (~9 MB) |
| - `beer.csv` β Beer review dataset (~2.5 MB) |
| - `BIRD.csv` β BIRD text-to-SQL dataset (~34 MB) |
| - `PDMX.csv` β PDMX metadata dataset (~7.4 MB) |
| - `products.csv` β Amazon product catalog (~16 MB) |
|
|
| 2. **Set your API key:** |
|
|
| ```bash |
| export OPENAI_API_KEY=... |
| ``` |
|
|
| ## Run |
|
|
| From the repo root: |
|
|
| ```bash |
| uv run skydiscover-run \ |
| benchmarks/ADRS/llm_sql/initial_program.py \ |
| benchmarks/ADRS/llm_sql/evaluator.py \ |
| -c benchmarks/ADRS/llm_sql/config.yaml \ |
| -s [your_algorithm] \ |
| -i 100 |
| ``` |
|
|
| ## Scoring |
|
|
| Combined score: `0.95 * average_hit_rate + 0.05 * (12 - min(12, avg_runtime)) / 12` |
|
|
| - **Hit rate** (95% weight): prefix-cache hit count normalized across 5 datasets |
| - **Runtime** (5% weight): wall-clock seconds for the reordering algorithm |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `initial_program.py` | Baseline `Evolved` class with `reorder()` method to evolve | |
| | `evaluator.py` | Scores programs on prefix hit rate and runtime across 5 datasets | |
| | `config.yaml` | Task-specific config (LLM, evaluator timeout, system prompt) | |
| | `solver.py` | Base `Algorithm` class and greedy baseline | |
| | `utils.py` | Prefix hit count evaluation utilities | |
| | `download_dataset.sh` | Script to download required CSV datasets | |
|
|