nilenpatel's picture
Upload pg_plan_cache models
406cec4 verified
---
library_name: sklearn
tags:
- postgresql
- sql
- query-cache
- plan-cache
- redis
- database
- tabular-classification
- tabular-regression
pipeline_tag: tabular-classification
license: mit
---
# pg_plan_cache Models
Three machine learning models for the **pg_plan_cache** PostgreSQL extension — a query
execution plan cache backed by Redis.
## Models
### 1. SQL Cache Advisor
- **Task:** Classification (high / medium / low)
- **Algorithm:** Random Forest (200 trees)
- **Purpose:** Predicts whether caching a query's execution plan will be beneficial
### 2. Cache TTL Recommender
- **Task:** Regression (seconds)
- **Algorithm:** Gradient Boosting
- **Purpose:** Recommends optimal cache TTL based on query characteristics
### 3. Query Complexity Estimator
- **Task:** Regression (1-100 score)
- **Algorithm:** Gradient Boosting
- **Purpose:** Estimates query complexity to prioritize caching resources
## Features
All models use 28 structural features extracted from raw SQL text:
| Feature | Description |
|---------|------------|
| `query_length` | Character count |
| `query_type` | SELECT=0, INSERT=1, UPDATE=2, DELETE=3 |
| `num_tables` | Tables referenced |
| `num_joins` | JOIN clause count |
| `num_conditions` | AND/OR conditions |
| `num_aggregates` | Aggregate function count |
| `num_subqueries` | Subquery count |
| `has_window_func` | Window functions present |
| `has_cte` | Common Table Expressions |
| `nesting_depth` | Max parenthesis depth |
| ... | 18 more features |
## Usage
```python
from predict import predict, format_prediction
result = predict("SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name")
print(format_prediction(result))
# Cache Benefit: HIGH
# Recommended TTL: 4200s (1h 10m)
# Complexity: 62/100 (complex)
```
## Training
Trained on 8,000 synthetic SQL queries across 18 complexity tiers:
- Simple SELECTs, filtered queries, ORDER BY
- Single and multi-table JOINs
- Aggregations with GROUP BY / HAVING
- Subqueries, correlated subqueries, EXISTS
- CTEs, window functions, UNION
- Complex analytics queries
- INSERT / UPDATE / DELETE (non-cacheable)
```bash
pip install -r requirements.txt
python train.py
```
## About pg_plan_cache
pg_plan_cache is a PostgreSQL extension that caches query execution plans in Redis.
It hooks into the PostgreSQL planner, normalizes queries, computes SHA-256 hashes,
and stores serialized plans with configurable TTL and automatic schema-change invalidation.