laterabhi commited on
Commit
979f139
·
verified ·
1 Parent(s): afa8b1d

Upload index.html with huggingface_hub

Browse files
Files changed (1) hide show
  1. index.html +76 -18
index.html CHANGED
@@ -1,19 +1,77 @@
1
- <!doctype html>
2
  <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
  <html>
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <title>GRPO SQL Optimizer</title>
6
+ <style>
7
+ body { font-family: sans-serif; max-width: 900px; margin: 40px auto; padding: 0 20px; }
8
+ img { max-width: 100%; }
9
+ table { border-collapse: collapse; width: 100%; }
10
+ th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
11
+ th { background: #f4f4f4; }
12
+ code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; }
13
+ </style>
14
+ </head>
15
+ <body>
16
+ <h1>GRPO Training for SQL Query Optimization</h1>
17
+
18
+ <h2>Overview</h2>
19
+ <p>Fine-tuned <code>Qwen/Qwen2.5-0.5B-Instruct</code> using GRPO (Group Relative Policy Optimization)
20
+ reinforcement learning to optimize SQL queries using a DuckDB execution environment.</p>
21
+
22
+ <h2>Results</h2>
23
+ <img src="grpo_results.png" alt="Training Curve"/>
24
+
25
+ <h3>Training Progress</h3>
26
+ <table>
27
+ <tr><th>Metric</th><th>Value</th></tr>
28
+ <tr><td>Start avg (ep1-10)</td><td>0.3090</td></tr>
29
+ <tr><td>End avg (ep91-100)</td><td>0.5962</td></tr>
30
+ <tr><td>Improvement</td><td>+93%</td></tr>
31
+ </table>
32
+
33
+ <h3>Final Evaluation</h3>
34
+ <table>
35
+ <tr><th>Task</th><th>Difficulty</th><th>Score</th></tr>
36
+ <tr><td>task_1_basic_antipatterns</td><td>easy</td><td>0.7500 ✅</td></tr>
37
+ <tr><td>task_2_correlated_subqueries</td><td>medium</td><td>0.8313 ✅</td></tr>
38
+ <tr><td>task_3_wildcard_scan</td><td>medium-hard</td><td>0.9250 ✅</td></tr>
39
+ <tr><td>task_4_implicit_join</td><td>hard</td><td>0.6438 ✅</td></tr>
40
+ <tr><td>task_5_window_functions</td><td>expert</td><td>0.6250 ⚠️</td></tr>
41
+ <tr><td><strong>Average</strong></td><td></td><td><strong>0.7550</strong></td></tr>
42
+ </table>
43
+ <p><strong>Baseline: 0.63 &nbsp;|&nbsp; Improvement: +0.1250 (+12.5%)</strong></p>
44
+
45
+ <h2>Approach</h2>
46
+ <h3>GRPO Training</h3>
47
+ <ul>
48
+ <li><strong>Algorithm:</strong> GRPO (Group Relative Policy Optimization)</li>
49
+ <li><strong>Base Model:</strong> Qwen/Qwen2.5-0.5B-Instruct</li>
50
+ <li><strong>Episodes:</strong> 100 × 4 completions per prompt</li>
51
+ <li><strong>Hardware:</strong> Kaggle GPU T4 x2</li>
52
+ </ul>
53
+
54
+ <h3>Reward Function</h3>
55
+ <ul>
56
+ <li><code>execution_speedup</code>: How much faster the optimized query runs</li>
57
+ <li><code>result_correctness</code>: Whether results are identical</li>
58
+ <li><code>issue_detection</code>: Whether SQL anti-patterns were identified</li>
59
+ <li><code>approval_correctness</code>: Whether approval flag is correct</li>
60
+ <li><code>summary_quality</code>: Quality of the explanation</li>
61
+ </ul>
62
+
63
+ <h2>Key Findings</h2>
64
+ <ol>
65
+ <li><strong>Reward variance is critical</strong> — Early runs had flat 0.08 rewards. Fixing the prompt to include schema information created reward variance needed for GRPO to learn.</li>
66
+ <li><strong>Prompt engineering matters</strong> — Telling the model to use only columns from the schema was the single most impactful fix.</li>
67
+ <li><strong>Partial credit helps</strong> — Adding issue detection bonus gave the model a learning signal even when SQL execution failed.</li>
68
+ </ol>
69
+
70
+ <h2>Links</h2>
71
+ <ul>
72
+ <li><a href="https://huggingface.co/laterabhi/grpo-sql-optimizer">Model on HuggingFace</a></li>
73
+ <li><a href="https://github.com/OfficialAbhinavSingh/SQL-Query-Optimization-Environment-">SQL Environment</a></li>
74
+ <li><a href="https://arxiv.org/abs/2402.03300">GRPO Paper</a></li>
75
+ </ul>
76
+ </body>
77
+ </html>