Spaces:

laterabhi
/

grpo-sql-optimizer

Running

App Files Files Community

laterabhi commited on Apr 26

Commit

979f139

verified ·

1 Parent(s): afa8b1d

Upload index.html with huggingface_hub

Browse files

Files changed (1) hide show

index.html +76 -18

index.html CHANGED Viewed

@@ -1,19 +1,77 @@
-<!doctype html>
 <html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
-</html>

+<!DOCTYPE html>
 <html>
+<head>
+<meta charset="utf-8">
+<title>GRPO SQL Optimizer</title>
+<style>
+  body { font-family: sans-serif; max-width: 900px; margin: 40px auto; padding: 0 20px; }
+  img  { max-width: 100%; }
+  table { border-collapse: collapse; width: 100%; }
+  th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
+  th { background: #f4f4f4; }
+  code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; }
+</style>
+</head>
+<body>
+<h1>GRPO Training for SQL Query Optimization</h1>
+<h2>Overview</h2>
+<p>Fine-tuned <code>Qwen/Qwen2.5-0.5B-Instruct</code> using GRPO (Group Relative Policy Optimization)
+reinforcement learning to optimize SQL queries using a DuckDB execution environment.</p>
+<h2>Results</h2>
+<img src="grpo_results.png" alt="Training Curve"/>
+<h3>Training Progress</h3>
+<table>
+<tr><th>Metric</th><th>Value</th></tr>
+<tr><td>Start avg (ep1-10)</td><td>0.3090</td></tr>
+<tr><td>End avg (ep91-100)</td><td>0.5962</td></tr>
+<tr><td>Improvement</td><td>+93%</td></tr>
+</table>
+<h3>Final Evaluation</h3>
+<table>
+<tr><th>Task</th><th>Difficulty</th><th>Score</th></tr>
+<tr><td>task_1_basic_antipatterns</td><td>easy</td><td>0.7500 ✅</td></tr>
+<tr><td>task_2_correlated_subqueries</td><td>medium</td><td>0.8313 ✅</td></tr>
+<tr><td>task_3_wildcard_scan</td><td>medium-hard</td><td>0.9250 ✅</td></tr>
+<tr><td>task_4_implicit_join</td><td>hard</td><td>0.6438 ✅</td></tr>
+<tr><td>task_5_window_functions</td><td>expert</td><td>0.6250 ⚠️</td></tr>
+<tr><td><strong>Average</strong></td><td></td><td><strong>0.7550</strong></td></tr>
+</table>
+<p><strong>Baseline: 0.63 &nbsp;|&nbsp; Improvement: +0.1250 (+12.5%)</strong></p>
+<h2>Approach</h2>
+<h3>GRPO Training</h3>
+<ul>
+<li><strong>Algorithm:</strong> GRPO (Group Relative Policy Optimization)</li>
+<li><strong>Base Model:</strong> Qwen/Qwen2.5-0.5B-Instruct</li>
+<li><strong>Episodes:</strong> 100 × 4 completions per prompt</li>
+<li><strong>Hardware:</strong> Kaggle GPU T4 x2</li>
+</ul>
+<h3>Reward Function</h3>
+<ul>
+<li><code>execution_speedup</code>: How much faster the optimized query runs</li>
+<li><code>result_correctness</code>: Whether results are identical</li>
+<li><code>issue_detection</code>: Whether SQL anti-patterns were identified</li>
+<li><code>approval_correctness</code>: Whether approval flag is correct</li>
+<li><code>summary_quality</code>: Quality of the explanation</li>
+</ul>
+<h2>Key Findings</h2>
+<ol>
+<li><strong>Reward variance is critical</strong> — Early runs had flat 0.08 rewards. Fixing the prompt to include schema information created reward variance needed for GRPO to learn.</li>
+<li><strong>Prompt engineering matters</strong> — Telling the model to use only columns from the schema was the single most impactful fix.</li>
+<li><strong>Partial credit helps</strong> — Adding issue detection bonus gave the model a learning signal even when SQL execution failed.</li>
+</ol>
+<h2>Links</h2>
+<ul>
+<li><a href="https://huggingface.co/laterabhi/grpo-sql-optimizer">Model on HuggingFace</a></li>
+<li><a href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-">SQL Environment</a></li>
+<li><a href="https://arxiv.org/abs/2402.03300">GRPO Paper</a></li>
+</ul>
+</body>
+</html>