| license: mit | |
| base_model: Qwen/Qwen2.5-0.5B-Instruct | |
| tags: | |
| - grpo | |
| - reinforcement-learning | |
| - sql | |
| - optimization | |
| # GRPO SQL Optimizer | |
| Fine-tuned `Qwen/Qwen2.5-0.5B-Instruct` with GRPO reinforcement learning | |
| to optimize SQL queries using a DuckDB execution environment. | |
| ## Results | |
| - **Average eval score: 0.7550** (+12.5% above baseline) | |
| - Trained for 100 episodes on 5 SQL optimization tasks | |
| ## Blog / Writeup | |
| https://huggingface.co/spaces/laterabhi/grpo-sql-optimizer | |
| ## Training Notebook | |
| Trained on Kaggle GPU T4 x2 using GRPO with verifiable rewards. | |