sky2 / benchmarks /ADRS /llm_sql /README.md
JustinTX's picture
Add files using upload-large-folder tool
b0e88cf verified

LLM-SQL — Column Reordering for Prefix Caching

When rows of a table are serialized into LLM prompts sequentially, consecutive rows that share leading column values can reuse cached prefixes. This task evolves a column-reordering strategy that maximizes prefix-cache hit rates across multiple real-world datasets without altering the underlying data.

Setup

  1. Download the datasets (~69 MB total):

    cd benchmarks/ADRS/llm_sql
    bash download_dataset.sh
    

    This downloads 5 CSV datasets into datasets/:

    • movies.csv — Rotten Tomatoes movie reviews (~9 MB)
    • beer.csv — Beer review dataset (~2.5 MB)
    • BIRD.csv — BIRD text-to-SQL dataset (~34 MB)
    • PDMX.csv — PDMX metadata dataset (~7.4 MB)
    • products.csv — Amazon product catalog (~16 MB)
  2. Set your API key:

    export OPENAI_API_KEY=...
    

Run

From the repo root:

uv run skydiscover-run \
  benchmarks/ADRS/llm_sql/initial_program.py \
  benchmarks/ADRS/llm_sql/evaluator.py \
  -c benchmarks/ADRS/llm_sql/config.yaml \
  -s [your_algorithm] \
  -i 100

Scoring

Combined score: 0.95 * average_hit_rate + 0.05 * (12 - min(12, avg_runtime)) / 12

  • Hit rate (95% weight): prefix-cache hit count normalized across 5 datasets
  • Runtime (5% weight): wall-clock seconds for the reordering algorithm

Files

File Description
initial_program.py Baseline Evolved class with reorder() method to evolve
evaluator.py Scores programs on prefix hit rate and runtime across 5 datasets
config.yaml Task-specific config (LLM, evaluator timeout, system prompt)
solver.py Base Algorithm class and greedy baseline
utils.py Prefix hit count evaluation utilities
download_dataset.sh Script to download required CSV datasets