Cross‑Engine SQL Query Router (SQLite vs DuckDB)
This model predicts which engine (SQLite or DuckDB) will run a SQL query faster, using a learned cost model trained on TPC‑H query variants.
Input: SQL query → Output: predicted runtime for each engine + recommended engine
Files in this repository
model_sqlite.joblib— regression model that predicts SQLite runtime (seconds)model_duckdb.joblib— regression model that predicts DuckDB runtime (seconds)model_metadata.json— metadata (feature list, training size, evaluation score, etc.)
How it works (high level)
- Extract 25 structural features from the query (joins, GROUP BY, subqueries, nesting depth, etc.)
- Predict runtime on SQLite
- Predict runtime on DuckDB
- Recommend the engine with the lower predicted runtime
Recommended usage (best way)
Use the full project repository which includes:
- the same feature extractor used during training (
models/predict.py) - a Streamlit UI (
app.py) - optional Live Test (runs the query on actual local SQLite/DuckDB databases)
Project repo: Rinil-Parmar/cross-engine-learned-cost-model
Dataset repo: Rinil-Parmar/tpch-query-routing-dataset
Minimal example (load models)
import joblib
sqlite_model = joblib.load("model_sqlite.joblib")
duckdb_model = joblib.load("model_duckdb.joblib")
print(type(sqlite_model), type(duckdb_model))
To make predictions correctly, you must use the same 25-feature extraction and feature ordering as in training (use models/predict.py from the project repo).
Limitations
- Trained on TPC‑H templates with randomized parameters (TPC‑H‑like analytic queries).
- Uses query structure only (no table statistics, indexes, cache state, hardware differences).
- May not generalize well to OLTP workloads or very different schemas.
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support