nilenpatel
/

pg-plan-cache-models

Tabular Classification

tabular-regression

Model card Files Files and versions

pg-plan-cache-models / README.md

nilenpatel's picture

Upload pg_plan_cache models

406cec4 verified 2 months ago

|

history blame contribute delete

2.6 kB

	---
	library_name: sklearn
	tags:
	- postgresql
	- sql
	- query-cache
	- plan-cache
	- redis
	- database
	- tabular-classification
	- tabular-regression
	pipeline_tag: tabular-classification
	license: mit
	---

	# pg_plan_cache Models

	Three machine learning models for the pg_plan_cache PostgreSQL extension — a query
	execution plan cache backed by Redis.

	## Models

	### 1. SQL Cache Advisor
	- Task: Classification (high / medium / low)
	- Algorithm: Random Forest (200 trees)
	- Purpose: Predicts whether caching a query's execution plan will be beneficial

	### 2. Cache TTL Recommender
	- Task: Regression (seconds)
	- Algorithm: Gradient Boosting
	- Purpose: Recommends optimal cache TTL based on query characteristics

	### 3. Query Complexity Estimator
	- Task: Regression (1-100 score)
	- Algorithm: Gradient Boosting
	- Purpose: Estimates query complexity to prioritize caching resources

	## Features

	All models use 28 structural features extracted from raw SQL text:

	\| Feature \| Description \|
	\|---------\|------------\|
	\| `query_length` \| Character count \|
	\| `query_type` \| SELECT=0, INSERT=1, UPDATE=2, DELETE=3 \|
	\| `num_tables` \| Tables referenced \|
	\| `num_joins` \| JOIN clause count \|
	\| `num_conditions` \| AND/OR conditions \|
	\| `num_aggregates` \| Aggregate function count \|
	\| `num_subqueries` \| Subquery count \|
	\| `has_window_func` \| Window functions present \|
	\| `has_cte` \| Common Table Expressions \|
	\| `nesting_depth` \| Max parenthesis depth \|
	\| ... \| 18 more features \|

	## Usage

	```python
	from predict import predict, format_prediction

	result = predict("SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name")
	print(format_prediction(result))
	# Cache Benefit: HIGH
	# Recommended TTL: 4200s (1h 10m)
	# Complexity: 62/100 (complex)
	```

	## Training

	Trained on 8,000 synthetic SQL queries across 18 complexity tiers:
	- Simple SELECTs, filtered queries, ORDER BY
	- Single and multi-table JOINs
	- Aggregations with GROUP BY / HAVING
	- Subqueries, correlated subqueries, EXISTS
	- CTEs, window functions, UNION
	- Complex analytics queries
	- INSERT / UPDATE / DELETE (non-cacheable)

	```bash
	pip install -r requirements.txt
	python train.py
	```

	## About pg_plan_cache

	pg_plan_cache is a PostgreSQL extension that caches query execution plans in Redis.
	It hooks into the PostgreSQL planner, normalizes queries, computes SHA-256 hashes,
	and stores serialized plans with configurable TTL and automatic schema-change invalidation.