File size: 8,935 Bytes
979f139
4585c0d
979f139
4585c0d
 
979f139
 
4585c0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
979f139
 
4585c0d
979f139
4585c0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
979f139
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>GRPO SQL Optimizer</title>
<style>
  :root{
    --bg:#0d1117; --fg:#e6edf3; --muted:#8b949e; --acc:#58a6ff;
    --card:#161b22; --bd:#30363d; --ok:#3fb950; --warn:#d29922;
  }
  *{box-sizing:border-box}
  body{
    margin:0;
    font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, Arial, sans-serif;
    background:var(--bg);
    color:var(--fg);
  }
  a{color:var(--acc); text-decoration:none}
  a:hover{text-decoration:underline}
  code{background:#1f2630; padding:2px 6px; border-radius:6px; border:1px solid var(--bd)}
  .wrap{max-width:980px; margin:0 auto; padding:18px 18px 40px}

  /* Top bar */
  .hfTopbar{
    position: sticky; top: 0; z-index: 9999;
    display:flex; gap:10px; align-items:center; justify-content:space-between;
    padding:10px 14px;
    background: rgba(13,17,23,.92);
    backdrop-filter: blur(8px);
    border-bottom: 1px solid rgba(48,54,61,.9);
  }
  .hfTopbar .left{display:flex; gap:10px; align-items:center; flex-wrap:wrap}
  .hfTopbar .brand{font-weight:800; color:var(--fg)}
  .hfTopbar .pill{
    color:var(--fg); text-decoration:none;
    border:1px solid rgba(48,54,61,.9);
    background:var(--card);
    padding:6px 10px;
    border-radius:999px;
    font-size:12px;
    display:inline-flex; align-items:center; gap:6px;
  }
  .hfTopbar .pill:hover{background:#1f2630; text-decoration:none}
  .hfTopbar .muted{color:var(--muted); font-size:12px}

  /* Content */
  h1{font-size:28px; margin:18px 0 8px}
  h2{font-size:18px; margin:22px 0 10px; color:var(--acc)}
  p{color:var(--fg); line-height:1.6}
  .sub{color:var(--muted); margin-top:0}

  .grid{display:grid; grid-template-columns: 1fr 1fr; gap:14px}
  @media (max-width: 860px){ .grid{grid-template-columns:1fr} }

  .card{
    background:var(--card);
    border:1px solid var(--bd);
    border-radius:12px;
    padding:14px;
  }
  .card h3{margin:0 0 10px; font-size:14px; color:var(--muted); font-weight:700}

  img{max-width:100%; border-radius:12px; border:1px solid var(--bd); background:#000}

  table{border-collapse:collapse; width:100%; overflow:hidden; border-radius:12px; border:1px solid var(--bd)}
  th,td{border-bottom:1px solid var(--bd); padding:10px; text-align:left; font-size:13px}
  th{background:#0b1320; color:var(--muted); font-weight:700}
  tr:last-child td{border-bottom:none}
  .ok{color:var(--ok); font-weight:800}
  .note{color:var(--muted); font-size:13px}

  .kpiRow{display:flex; flex-wrap:wrap; gap:10px; margin-top:10px}
  .kpi{
    background:#0b1320;
    border:1px solid var(--bd);
    border-radius:12px;
    padding:10px 12px;
    min-width: 170px;
  }
  .kpi .v{font-weight:900; font-size:16px}
  .kpi .l{color:var(--muted); font-size:12px}

  footer{margin-top:26px; color:var(--muted); font-size:12px}
</style>
</head>

<body>
  <div class="hfTopbar">
    <div class="left">
      <span class="brand">🗄️ GRPO SQL Optimizer</span>
      <a class="pill" href="https://huggingface.co/spaces/laterabhi/grpo-sql-optimizer/blob/main/README.md" target="_blank" rel="noopener">📝 Writeup</a>
      <a class="pill" href="https://huggingface.co/spaces/laterabhi/grpo-sql-optimizer/blob/main/Blog.md" target="_blank" rel="noopener">📄 Blog.md</a>
      <a class="pill" href="https://huggingface.co/laterabhi/grpo-sql-optimizer" target="_blank" rel="noopener">🤗 Model</a>
      <a class="pill" href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-" target="_blank" rel="noopener">💻 GitHub</a>
    </div>
    <div class="muted">DuckDB-verifiable rewards · OpenEnv</div>
  </div>

  <div class="wrap">
    <h1>GRPO Training for SQL Query Optimization</h1>
    <p class="sub">
      Fine-tuned <code>Qwen/Qwen2.5-0.5B-Instruct</code> using GRPO (Group Relative Policy Optimization)
      to optimize SQL queries using a DuckDB execution environment.
    </p>

    <div class="grid">
      <div class="card">
        <h3>Overview</h3>
        <p>
          This project trains/evaluates SQL optimization with <strong>execution-grounded</strong> rewards:
          the environment executes both original and rewritten SQL on real DuckDB data, and scores
          speedup + correctness + structured diagnostics.
        </p>
        <div class="kpiRow">
          <div class="kpi">
            <div class="v">5</div>
            <div class="l">tasks (easy → expert)</div>
          </div>
          <div class="kpi">
            <div class="v">DuckDB</div>
            <div class="l">verifiable execution</div>
          </div>
          <div class="kpi">
            <div class="v">GRPO</div>
            <div class="l">group-relative RL</div>
          </div>
        </div>
      </div>

      <div class="card">
        <h3>Training curve</h3>

        <!-- Prefer local image if you uploaded it to this Space -->
        <img src="grpo_results.png" alt="GRPO training curve" />

        <p class="note" style="margin-top:10px">
          If this image ever breaks, the canonical plot is also in the GitHub repo:
          <a href="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/grpo_reward_curve.png" target="_blank" rel="noopener">
            results/grpo_reward_curve.png
          </a>.
        </p>
      </div>
    </div>

    <h2>Training progress (100 episodes)</h2>
    <table>
      <tr><th>Metric</th><th>Value</th></tr>
      <tr><td>Start avg (ep 1–10)</td><td>0.3090</td></tr>
      <tr><td>End avg (ep 91–100)</td><td>0.5962</td></tr>
      <tr><td><strong>Improvement</strong></td><td><strong>+93%</strong></td></tr>
    </table>

    <h2>Final evaluation (per task)</h2>
    <p class="note">
      These task scores are aligned to the GitHub repo README (source of truth). Task 5 is the expert scenario,
      so it is expected to be the lowest — that is not an error.
    </p>
    <table>
      <tr><th>Task</th><th>Difficulty</th><th>Score</th></tr>
      <tr><td>task_1_basic_antipatterns</td><td>easy</td><td><span class="ok">0.7500 ✅</span></td></tr>
      <tr><td>task_2_correlated_subqueries</td><td>medium</td><td><span class="ok">0.8313 ✅</span></td></tr>
      <tr><td>task_3_wildcard_scan</td><td>medium-hard</td><td><span class="ok">0.6563 ✅</span></td></tr>
      <tr><td>task_4_implicit_join</td><td>hard</td><td><span class="ok">0.6563 ✅</span></td></tr>
      <tr><td>task_5_window_functions</td><td>expert</td><td><span class="ok">0.6500 ✅</span></td></tr>
    </table>

    <h2>Before / After (environment-only, reproducible)</h2>
    <p class="note">
      To avoid hand-wavy baselines, we provide a reproducible before/after contrast in the GitHub repo:
      “before” = analysis-only (no optimized SQL), “after” = deterministic fallback with a real optimized query.
      Chart:
      <a href="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/before_after_chart.png" target="_blank" rel="noopener">
        results/before_after_chart.png
      </a>
    </p>
    <img
      src="https://raw.githubusercontent.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-/main/results/before_after_chart.png"
      alt="Before/after chart"
    />

    <h2>Approach</h2>
    <div class="card">
      <h3>GRPO setup</h3>
      <ul class="note">
        <li><strong>Algorithm:</strong> GRPO (Group Relative Policy Optimization)</li>
        <li><strong>Base model:</strong> Qwen/Qwen2.5-0.5B-Instruct</li>
        <li><strong>Group size:</strong> 4 completions per prompt</li>
        <li><strong>Hardware:</strong> Kaggle GPU T4 x2 (see repo links)</li>
      </ul>

      <h3 style="margin-top:12px">Reward components</h3>
      <ul class="note">
        <li><code>execution_speedup</code>: measured DuckDB timing ratio</li>
        <li><code>result_correctness</code>: results match check (order-independent for large sets)</li>
        <li><code>issue_detection</code>: anti-pattern detection vs ground truth keywords</li>
        <li><code>approval_correctness</code>, <code>summary_quality</code>, <code>severity_labels</code></li>
      </ul>
    </div>

    <h2>Links</h2>
    <ul class="note">
      <li><a href="https://huggingface.co/laterabhi/grpo-sql-optimizer" target="_blank" rel="noopener">Model on Hugging Face</a></li>
      <li><a href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-" target="_blank" rel="noopener">GitHub repo (source of truth)</a></li>
      <li><a href="https://arxiv.org/abs/2402.03300" target="_blank" rel="noopener">GRPO paper</a></li>
    </ul>

    <footer>
      Tip: If you want all images to load instantly (no GitHub raw), upload the PNGs into this Space repo and change the <code>src</code> to local file paths.
    </footer>
  </div>
</body>
</html>