Cross‑Engine SQL Query Router (SQLite vs DuckDB)

This model predicts which engine (SQLite or DuckDB) will run a SQL query faster, using a learned cost model trained on TPC‑H query variants.

Input: SQL query → Output: predicted runtime for each engine + recommended engine


Files in this repository

  • model_sqlite.joblib — regression model that predicts SQLite runtime (seconds)
  • model_duckdb.joblib — regression model that predicts DuckDB runtime (seconds)
  • model_metadata.json — metadata (feature list, training size, evaluation score, etc.)

How it works (high level)

  1. Extract 25 structural features from the query (joins, GROUP BY, subqueries, nesting depth, etc.)
  2. Predict runtime on SQLite
  3. Predict runtime on DuckDB
  4. Recommend the engine with the lower predicted runtime

Recommended usage (best way)

Use the full project repository which includes:

  • the same feature extractor used during training (models/predict.py)
  • a Streamlit UI (app.py)
  • optional Live Test (runs the query on actual local SQLite/DuckDB databases)

Project repo: Rinil-Parmar/cross-engine-learned-cost-model

Dataset repo: Rinil-Parmar/tpch-query-routing-dataset


Minimal example (load models)

import joblib

sqlite_model = joblib.load("model_sqlite.joblib")
duckdb_model  = joblib.load("model_duckdb.joblib")

print(type(sqlite_model), type(duckdb_model))

To make predictions correctly, you must use the same 25-feature extraction and feature ordering as in training (use models/predict.py from the project repo).


Limitations

  • Trained on TPC‑H templates with randomized parameters (TPC‑H‑like analytic queries).
  • Uses query structure only (no table statistics, indexes, cache state, hardware differences).
  • May not generalize well to OLTP workloads or very different schemas.

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support