Rinil-Parmar commited on
Commit
2c5da9b
·
verified ·
1 Parent(s): f9c6d71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: scikit-learn
3
+ tags:
4
+ - sql
5
+ - query-routing
6
+ - query-optimization
7
+ - learned-cost-model
8
+ - sqlite
9
+ - duckdb
10
+ - tpch
11
+ license: mit
12
+ ---
13
+
14
+ # Cross‑Engine SQL Query Router (SQLite vs DuckDB)
15
+
16
+ This model predicts which engine (**SQLite** or **DuckDB**) will run a SQL query faster, using a **learned cost model** trained on TPC‑H query variants.
17
+
18
+ **Input:** SQL query → **Output:** predicted runtime for each engine + recommended engine
19
+
20
+ ---
21
+
22
+ ## Files in this repository
23
+
24
+ - `model_sqlite.joblib` — regression model that predicts SQLite runtime (seconds)
25
+ - `model_duckdb.joblib` — regression model that predicts DuckDB runtime (seconds)
26
+ - `model_metadata.json` — metadata (feature list, training size, evaluation score, etc.)
27
+
28
+ ---
29
+
30
+ ## How it works (high level)
31
+
32
+ 1. Extract **25 structural features** from the query (joins, GROUP BY, subqueries, nesting depth, etc.)
33
+ 2. Predict runtime on **SQLite**
34
+ 3. Predict runtime on **DuckDB**
35
+ 4. Recommend the engine with the lower predicted runtime
36
+
37
+ ---
38
+
39
+ ## Recommended usage (best way)
40
+
41
+ Use the full project repository which includes:
42
+ - the **same feature extractor** used during training (`models/predict.py`)
43
+ - a **Streamlit UI** (`app.py`)
44
+ - optional **Live Test** (runs the query on actual local SQLite/DuckDB databases)
45
+
46
+ Project repo: `Rinil-Parmar/cross-engine-learned-cost-model`
47
+
48
+ Dataset repo: `Rinil-Parmar/tpch-query-routing-dataset`
49
+
50
+ ---
51
+
52
+ ## Minimal example (load models)
53
+
54
+ ```python
55
+ import joblib
56
+
57
+ sqlite_model = joblib.load("model_sqlite.joblib")
58
+ duckdb_model = joblib.load("model_duckdb.joblib")
59
+
60
+ print(type(sqlite_model), type(duckdb_model))
61
+ ```
62
+
63
+ To make predictions correctly, you must use the **same 25-feature extraction** and feature ordering as in training (use `models/predict.py` from the project repo).
64
+
65
+ ---
66
+
67
+ ## Limitations
68
+
69
+ - Trained on **TPC‑H** templates with randomized parameters (TPC‑H‑like analytic queries).
70
+ - Uses **query structure only** (no table statistics, indexes, cache state, hardware differences).
71
+ - May not generalize well to OLTP workloads or very different schemas.
72
+
73
+ ---
74
+
75
+ ## License
76
+
77
+ MIT