Tabular Classification
Scikit-learn
Joblib
postgresql
sql
query-cache
plan-cache
redis
database
tabular-regression
Instructions to use nilenpatel/pg-plan-cache-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use nilenpatel/pg-plan-cache-models with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("nilenpatel/pg-plan-cache-models", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| library_name: sklearn | |
| tags: | |
| - postgresql | |
| - sql | |
| - query-cache | |
| - plan-cache | |
| - redis | |
| - database | |
| - tabular-classification | |
| - tabular-regression | |
| pipeline_tag: tabular-classification | |
| license: mit | |
| # pg_plan_cache Models | |
| Three machine learning models for the **pg_plan_cache** PostgreSQL extension — a query | |
| execution plan cache backed by Redis. | |
| ## Models | |
| ### 1. SQL Cache Advisor | |
| - **Task:** Classification (high / medium / low) | |
| - **Algorithm:** Random Forest (200 trees) | |
| - **Purpose:** Predicts whether caching a query's execution plan will be beneficial | |
| ### 2. Cache TTL Recommender | |
| - **Task:** Regression (seconds) | |
| - **Algorithm:** Gradient Boosting | |
| - **Purpose:** Recommends optimal cache TTL based on query characteristics | |
| ### 3. Query Complexity Estimator | |
| - **Task:** Regression (1-100 score) | |
| - **Algorithm:** Gradient Boosting | |
| - **Purpose:** Estimates query complexity to prioritize caching resources | |
| ## Features | |
| All models use 28 structural features extracted from raw SQL text: | |
| | Feature | Description | | |
| |---------|------------| | |
| | `query_length` | Character count | | |
| | `query_type` | SELECT=0, INSERT=1, UPDATE=2, DELETE=3 | | |
| | `num_tables` | Tables referenced | | |
| | `num_joins` | JOIN clause count | | |
| | `num_conditions` | AND/OR conditions | | |
| | `num_aggregates` | Aggregate function count | | |
| | `num_subqueries` | Subquery count | | |
| | `has_window_func` | Window functions present | | |
| | `has_cte` | Common Table Expressions | | |
| | `nesting_depth` | Max parenthesis depth | | |
| | ... | 18 more features | | |
| ## Usage | |
| ```python | |
| from predict import predict, format_prediction | |
| result = predict("SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name") | |
| print(format_prediction(result)) | |
| # Cache Benefit: HIGH | |
| # Recommended TTL: 4200s (1h 10m) | |
| # Complexity: 62/100 (complex) | |
| ``` | |
| ## Training | |
| Trained on 8,000 synthetic SQL queries across 18 complexity tiers: | |
| - Simple SELECTs, filtered queries, ORDER BY | |
| - Single and multi-table JOINs | |
| - Aggregations with GROUP BY / HAVING | |
| - Subqueries, correlated subqueries, EXISTS | |
| - CTEs, window functions, UNION | |
| - Complex analytics queries | |
| - INSERT / UPDATE / DELETE (non-cacheable) | |
| ```bash | |
| pip install -r requirements.txt | |
| python train.py | |
| ``` | |
| ## About pg_plan_cache | |
| pg_plan_cache is a PostgreSQL extension that caches query execution plans in Redis. | |
| It hooks into the PostgreSQL planner, normalizes queries, computes SHA-256 hashes, | |
| and stores serialized plans with configurable TTL and automatic schema-change invalidation. | |