# LLM-SQL — Column Reordering for Prefix Caching

When rows of a table are serialized into LLM prompts sequentially, consecutive rows that share leading column values can reuse cached prefixes. This task evolves a column-reordering strategy that maximizes prefix-cache hit rates across multiple real-world datasets without altering the underlying data.

## Setup

1. **Download the datasets** (~69 MB total):

   ```bash
   cd benchmarks/ADRS/llm_sql
   bash download_dataset.sh
   ```

   This downloads 5 CSV datasets into `datasets/`:
   - `movies.csv` — Rotten Tomatoes movie reviews (~9 MB)
   - `beer.csv` — Beer review dataset (~2.5 MB)
   - `BIRD.csv` — BIRD text-to-SQL dataset (~34 MB)
   - `PDMX.csv` — PDMX metadata dataset (~7.4 MB)
   - `products.csv` — Amazon product catalog (~16 MB)

2. **Set your API key:**

   ```bash
   export OPENAI_API_KEY=...
   ```

## Run

From the repo root:

```bash
uv run skydiscover-run \
  benchmarks/ADRS/llm_sql/initial_program.py \
  benchmarks/ADRS/llm_sql/evaluator.py \
  -c benchmarks/ADRS/llm_sql/config.yaml \
  -s [your_algorithm] \
  -i 100
```

## Scoring

Combined score: `0.95 * average_hit_rate + 0.05 * (12 - min(12, avg_runtime)) / 12`

- **Hit rate** (95% weight): prefix-cache hit count normalized across 5 datasets
- **Runtime** (5% weight): wall-clock seconds for the reordering algorithm

## Files

| File | Description |
|------|-------------|
| `initial_program.py` | Baseline `Evolved` class with `reorder()` method to evolve |
| `evaluator.py` | Scores programs on prefix hit rate and runtime across 5 datasets |
| `config.yaml` | Task-specific config (LLM, evaluator timeout, system prompt) |
| `solver.py` | Base `Algorithm` class and greedy baseline |
| `utils.py` | Prefix hit count evaluation utilities |
| `download_dataset.sh` | Script to download required CSV datasets |