File size: 2,599 Bytes
406cec4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---

library_name: sklearn
tags:
  - postgresql
  - sql
  - query-cache
  - plan-cache
  - redis
  - database
  - tabular-classification
  - tabular-regression
pipeline_tag: tabular-classification
license: mit
---


# pg_plan_cache Models

Three machine learning models for the **pg_plan_cache** PostgreSQL extension — a query
execution plan cache backed by Redis.

## Models

### 1. SQL Cache Advisor
- **Task:** Classification (high / medium / low)
- **Algorithm:** Random Forest (200 trees)
- **Purpose:** Predicts whether caching a query's execution plan will be beneficial

### 2. Cache TTL Recommender
- **Task:** Regression (seconds)
- **Algorithm:** Gradient Boosting
- **Purpose:** Recommends optimal cache TTL based on query characteristics

### 3. Query Complexity Estimator
- **Task:** Regression (1-100 score)
- **Algorithm:** Gradient Boosting
- **Purpose:** Estimates query complexity to prioritize caching resources

## Features

All models use 28 structural features extracted from raw SQL text:

| Feature | Description |
|---------|------------|
| `query_length` | Character count |
| `query_type` | SELECT=0, INSERT=1, UPDATE=2, DELETE=3 |
| `num_tables` | Tables referenced |
| `num_joins` | JOIN clause count |
| `num_conditions` | AND/OR conditions |
| `num_aggregates` | Aggregate function count |
| `num_subqueries` | Subquery count |
| `has_window_func` | Window functions present |
| `has_cte` | Common Table Expressions |
| `nesting_depth` | Max parenthesis depth |
| ... | 18 more features |

## Usage

```python

from predict import predict, format_prediction



result = predict("SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name")

print(format_prediction(result))

# Cache Benefit: HIGH

# Recommended TTL: 4200s (1h 10m)

# Complexity: 62/100 (complex)

```

## Training

Trained on 8,000 synthetic SQL queries across 18 complexity tiers:
- Simple SELECTs, filtered queries, ORDER BY
- Single and multi-table JOINs
- Aggregations with GROUP BY / HAVING
- Subqueries, correlated subqueries, EXISTS
- CTEs, window functions, UNION
- Complex analytics queries
- INSERT / UPDATE / DELETE (non-cacheable)

```bash

pip install -r requirements.txt

python train.py

```

## About pg_plan_cache

pg_plan_cache is a PostgreSQL extension that caches query execution plans in Redis.
It hooks into the PostgreSQL planner, normalizes queries, computes SHA-256 hashes,
and stores serialized plans with configurable TTL and automatic schema-change invalidation.