sakthivarshans commited on
Commit
608369f
·
1 Parent(s): e362af4

Updated Readme for better reference

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -253,11 +253,11 @@ sql_arena_env/
253
 
254
  ## Why SQLArenaEnv?
255
 
256
- **The gap it fills:** Text-to-SQL benchmarks like Spider and BIRD measure single-shot accuracy. No existing OpenEnv environment measures *multi-step SQL reasoning* where the agent can gather information before committing. This is the benchmark that matches how SQL is actually used.
257
 
258
- **Why exploration matters for RL training:** An agent that learns to run `SELECT * FROM table LIMIT 5` before attempting a complex GROUP BY query is learning a genuinely useful cognitive strategy the same strategy a senior data analyst uses. Standard single-shot SQL environments cannot teach this. SQLArenaEnv can.
259
 
260
- **What improves with training:** GRPO/PPO agents trained on SQLArenaEnv learn to use explore steps strategically they converge to running schema-discovery queries first (`SELECT * FROM sqlite_master`), then sample queries, then submitting. This mirrors expert human behavior and transfers to real SQL tasks.
261
 
262
  ---
263
 
 
253
 
254
  ## Why SQLArenaEnv?
255
 
256
+ **The gap it fills:** Text to SQL benchmarks like Spider and BIRD measure single shot accuracy. No existing OpenEnv environment measures *multi step SQL reasoning* where the agent can gather information before committing. This is the benchmark that matches how SQL is actually used.
257
 
258
+ **Why exploration matters for RL training:** An agent that learns to run `SELECT * FROM table LIMIT 5` before attempting a complex GROUP BY query is learning a genuinely useful cognitive strategy, the same strategy a senior data analyst uses. Standard single shot SQL environments cannot teach this. SQLArenaEnv can.
259
 
260
+ **What improves with training:** GRPO/PPO agents trained on SQLArenaEnv learn to use explore steps strategically, they converge to running schema discovery queries first (`SELECT * FROM sqlite_master`), then sample queries, then submitting. This mirrors expert human behavior and transfers to real SQL tasks.
261
 
262
  ---
263