Spaces:

hjerpe
/

sql_env

Sleeping

App Files Files Community

sql_env / DEMO_action_feature.md

hjerpe

Upload folder using huggingface_hub

9e64e71 verified 2 months ago

preview code

Raw

History Blame Contribute Delete

16.5 kB

	# Feature Demo: `action-feature` — Core Action Dispatch System

	> Generated: 2026-02-28T14:46:19+01:00
	> Branch: `origin/action-feature` vs `main`
	> Project: sql-env-onboarding (SQLEnv RL Environment for OpenEnv Challenge)
	> Execution environment: Python 3.11, torch 2.2.2, MockTokenizer (no Ollama required)

	---

	## What This Feature Does

	The `action-feature` branch adds the core action dispatch system to SQLEnv — the RL environment where AI agents learn interactive SQL exploration. Before this branch, the environment had data models but no way to actually process agent actions.

	Now an agent can send natural language messages like _"describe the students table"_ or _"find all students enrolled in CS101"_, and the environment automatically classifies them into one of three action types (describe, sample, query), dispatches them to the appropriate handler, and returns observations with tokenized conversation history. This is the fundamental interaction loop that makes the environment usable for reinforcement learning.

	The system works in two modes:
	- Mock path (no external services): Uses `MockTokenizer` for tokenization. Describe and sample actions work fully; query actions return a graceful error since Ollama is unavailable.
	- Ollama path (full pipeline): Uses a local Ollama LLM to select relevant tables for describe/sample and to generate SQL for query actions.

	---

	## Quickstart

	```bash
	# 1. Checkout the branch
	git checkout origin/action-feature --detach

	# 2. Install dependencies (from the sql_env package directory)
	cd envs/sql_env/
	uv sync
	uv add sqlalchemy # missing from pyproject.toml, needed for ORM models

	# 3. Run the full demo (71 checks, ~2 seconds, no external services needed)
	uv run python demo_action_feature.py

	# 4. (Optional) Return to main when done
	git checkout main
	```

	Prerequisites: Python 3.11+, `uv` package manager, git.
	Optional: Ollama running locally with `llama3.2` model for full query generation (set `OLLAMA_MODEL=llama3.2`).

	> Note: `sqlalchemy` is required by the ORM models but was omitted from `pyproject.toml` on the branch. The `uv add sqlalchemy` step is necessary.

	> Note: On Python < 3.12, a Pydantic compatibility patch is needed because `openenv` defines `Message` with `typing.TypedDict` instead of `typing_extensions.TypedDict`. The demo script applies this patch automatically.

	---

	## Live Demo — Mock Path (Primary)

	All output below was captured by executing `uv run python demo_action_feature.py` on the `action-feature` branch with no Ollama model configured (default `qwen2` not installed).

	### 1. Environment Instantiation with MockTokenizer

	The environment loads 9 SQLAlchemy ORM models (Address, Person, Student, Course, etc.) and initializes conversation state with a system prompt.

	```bash
	uv run python demo_action_feature.py
	```

	```
	============================================================
	1. Environment Instantiation with MockTokenizer
	============================================================
	[PASS] MockTokenizer created
	[PASS] SQLEnvironment created
	[PASS] System prompt stored
	[PASS] Tokenizer stored
	[PASS] 9 database models loaded
	[PASS] Initial state has 1 message (system)
	[PASS] Initial state has 1 token tensor
	[PASS] System message role is correct
	[PASS] System message content matches prompt
	```

	The environment correctly stores the custom system prompt, attaches the tokenizer, and loads all 9 database table models from SQLAlchemy.

	### 2. reset() Returns Valid SQLObservation

	```
	============================================================
	2. reset() Returns Valid SQLObservation
	============================================================
	[PASS] reset() returns SQLObservation
	[PASS] Observation has messages list
	[PASS] Messages contain system prompt
	[PASS] Observation has tokens tensor
	[PASS] Tokens tensor is 1D
	[PASS] Tokens are non-empty
	[PASS] done is False
	[PASS] reward is None

	Observation details:
	messages count: 1
	tokens shape: torch.Size([29])
	tokens[:10]: [89, 111, 117, 32, 97, 114, 101, 32, 97, 32]
	```

	After `reset()`, the observation contains one message (the system prompt) tokenized into a 1D tensor via `MockTokenizer` (char codes mod 256). The episode is not done and has no reward — ready for the agent's first action.

	### 3. Action Type Detection

	The keyword classifier maps user messages to three action types: `describe`, `sample`, and `query`.

	```
	============================================================
	3. Action Type Detection (_detect_action_type)
	============================================================
	[PASS] 'describe the students table...' -> describe
	[PASS] 'what columns does Course have...' -> describe
	[PASS] 'show me the schema...' -> describe
	[PASS] 'show me sample rows from students...' -> sample
	[PASS] 'give me example data...' -> sample
	[PASS] 'how many rows are in Courses...' -> sample
	[PASS] 'find all students enrolled in CS101...' -> query
	[PASS] 'select count(*) from students...' -> query
	[PASS] 'what is the average score...' -> query
	```

	All 9 test cases correctly classified. Keywords like "describe"/"schema"/"columns" trigger describe; "sample"/"example"/"rows" trigger sample; everything else defaults to query.

	### 4. message_to_action() Creates Properly Typed SQLAction

	```
	============================================================
	4. message_to_action() Creates Properly Typed SQLAction
	============================================================
	[PASS] Returns SQLAction
	[PASS] action_type is 'describe'
	[PASS] action_description is message content
	[PASS] tokens is a torch.Tensor
	[PASS] tokens tensor is non-empty

	Action details:
	action_type: describe
	action_description: describe the students table
	tokens shape: torch.Size([1, 57])
	[PASS] message_to_action adds message to history
	[PASS] History[1] is the user message
	[PASS] Sample message -> action_type 'sample'
	[PASS] Query message -> action_type 'query'
	```

	`message_to_action()` converts a raw message dict into a `SQLAction` with the correct `action_type`, `action_description`, and tokenized tensor. Important side effect: it also appends the message to `_state.history_messages` before tokenizing, so the tokenizer sees the full conversation context.

	### 5. step() with Describe Action

	Without Ollama, the describe action falls back to the first table (Address) and returns its full SQLAlchemy-derived schema.

	```
	============================================================
	5. step() with Describe Action (Schema from SQLAlchemy Models)
	============================================================
	[PASS] step() returns SQLObservation
	[PASS] History now has 3 messages (system + user + assistant)
	[PASS] Last message is from assistant
	[PASS] Assistant message contains 'columns'
	[PASS] Schema info contains column descriptions

	Describe response (first 200 chars):
	Table 'Address' has the following columns:

	- address_id: integer number
	- line_1: text (up to 255 characters)
	- line_2: text (up to 255 characters)
	- city: text (up to 255 characters)
	- zip_postcode:
	[PASS] Tokens tensor grew after step
	```

	The schema is extracted directly from SQLAlchemy model introspection (column names, types converted to natural language). The observation now has 3 messages (system → user → assistant) and the token tensor grew.

	### 6. step() with Sample Action

	The sample action generates a `SELECT * ... LIMIT 10` query for the target table.

	```
	============================================================
	6. step() with Sample Action (Generates SQL Query Text)
	============================================================
	[PASS] step(sample) returns assistant message
	[PASS] Sample response contains SELECT
	[PASS] Sample response contains LIMIT

	Sample response:
	Here's a query to sample data from Address:

	SELECT * FROM Address LIMIT 10;
	```

	### 7. step() with Query Action (No Ollama)

	Without Ollama, the query action returns a clear error message instead of crashing.

	```
	============================================================
	7. step() with Query Action (Mock Path — No Ollama)
	============================================================
	[PASS] step(query) returns assistant message
	[PASS] Query response is error string (no Ollama) or SQL

	Query response (no Ollama):
	Error: Ollama returned status 404
	```

	The error is a graceful 404 (Ollama server is running but the default `qwen2` model isn't installed). The conversation continues normally — the error becomes part of the message history.

	### 8. Multi-Turn Conversation State Management

	Three turns of alternating user/assistant messages, verifying the conversation history grows correctly.

	```
	============================================================
	8. Multi-Turn Conversation State Management
	============================================================
	[PASS] After turn 1: 3 messages (sys + user + assistant)
	[PASS] After turn 2: 5 messages (sys + u1 + a1 + u2 + a2)
	[PASS] Tokens grew between turns
	[PASS] After turn 3: 7 messages
	[PASS] Tokens grew again
	[PASS] Message roles follow expected pattern

	Conversation summary after 3 turns:
	[0] system: You are a test SQL assistant....
	[1] user: describe the Address table...
	[2] assistant: Table 'Address' has the following columns: - address_id: in...
	[3] user: show me sample rows...
	[4] assistant: Here's a query to sample data from Address: SELECT * FROM A...
	[5] user: find all addresses in New York...
	[6] assistant: Error: Ollama returned status 404...
	Total tokens: 987
	```

	Message roles follow the expected `[system, user, assistant, user, assistant, user, assistant]` pattern. Token count grows monotonically: the `_create_observation()` method flattens all `history_tokens` into a single 1D tensor via `torch.cat`.

	### 9. Client Serialization Roundtrip

	The `SQLEnvClient` converts tensor → list for JSON transport and list → tensor on the return path.

	```
	============================================================
	9. Client Serialization Roundtrip (_step_payload)
	============================================================
	[PASS] Payload has action_type
	[PASS] Payload has action_description
	[PASS] Tokens converted to list
	[PASS] Token values preserved
	[PASS] _parse_result returns StepResult
	[PASS] Observation messages parsed
	[PASS] Tokens converted back to tensor
	[PASS] Token values correct
	[PASS] Reward parsed

	Payload serialization:
	action_type: query
	tokens (list): [[1, 2, 3, 4, 5]]
	```

	---

	## Edge Cases Exercised

	### Empty System Prompt

	When no system prompt is provided (empty string), the environment correctly starts with zero messages and an empty token tensor.

	```
	[PASS] Empty system prompt -> no messages in history
	[PASS] Empty system prompt -> empty tokens
	```

	### Invalid Message Inputs

	`message_to_action()` validates its input and raises `ValueError` for malformed messages.

	```
	[PASS] Missing 'role' raises ValueError
	[PASS] Missing 'content' raises ValueError
	[PASS] None content raises ValueError
	```

	### Unknown Table Handling

	Schema lookup and sample query generation gracefully handle non-existent tables.

	```
	[PASS] Unknown table returns 'not found' message
	[PASS] Unknown table sample returns error
	```

	### MockTokenizer Encode/Decode Roundtrip

	The mock tokenizer's `ord(c) % 256` encoding correctly roundtrips through `chr(t)` decoding.

	```
	[PASS] MockTokenizer encode/decode roundtrip
	```

	### Invalid Tokenizer Validation

	The environment constructor rejects tokenizers missing `apply_chat_template`.

	```
	[PASS] Invalid tokenizer raises ValueError
	```

	---

	## Live Demo — Ollama Path (Optional)

	When Ollama is running locally with a compatible model, the query action generates real SQL and the describe action selects the correct table.

	### Describe with Ollama

	With `OLLAMA_MODEL=llama3.2`, the LLM correctly identifies "Student" as the most relevant table for "describe the students table":

	```
	DESCRIBE RESULT:
	Table 'Student' has the following columns:

	- student_id: integer number
	- student_details: text (up to 255 characters)
	```

	Compare with mock path: fell back to "Address" (first table in dict). With Ollama, table selection is intelligent.

	### Query with Ollama

	The LLM generates valid SQL for natural language questions:

	```
	QUERY RESULT:
	SELECT * FROM Students WHERE CourseID IN (SELECT CourseID FROM Courses WHERE CourseName = 'CS101')
	```

	> Note: The generated SQL references column names from the schema description prompt, not the actual SQLAlchemy model column names. This is expected — the LLM generates SQL based on the natural language schema it receives.

	---

	## Full Result Summary

	```
	============================================================
	SUMMARY
	============================================================
	Total checks: 71
	Passed: 71
	Failed: 0

	ALL CHECKS PASSED
	```

	\| Category \| Checks \| Passed \| Failed \|
	\|----------\|--------\|--------\|--------\|
	\| Imports \| 1 \| 1 \| 0 \|
	\| Instantiation \| 8 \| 8 \| 0 \|
	\| reset() \| 8 \| 8 \| 0 \|
	\| Action detection \| 9 \| 9 \| 0 \|
	\| message_to_action \| 9 \| 9 \| 0 \|
	\| step(describe) \| 6 \| 6 \| 0 \|
	\| step(sample) \| 3 \| 3 \| 0 \|
	\| step(query) \| 2 \| 2 \| 0 \|
	\| Multi-turn state \| 6 \| 6 \| 0 \|
	\| Client serialization \| 9 \| 9 \| 0 \|
	\| Edge cases \| 9 \| 9 \| 0 \|
	\| Total \| 71 \| 71 \| 0 \|

	---

	## Verification Checklist

	- [x] Environment instantiation with MockTokenizer — 8 checks
	- [x] `reset()` returns valid SQLObservation with system prompt — 8 checks
	- [x] Action type detection for all 3 types (describe/sample/query) — 9 keywords tested
	- [x] `message_to_action()` creates SQLAction with correct type and tokens — 9 checks
	- [x] `step()` with describe returns schema from SQLAlchemy models — 6 checks
	- [x] `step()` with sample returns SQL query text — 3 checks
	- [x] `step()` with query returns Ollama error gracefully (mock path) — 2 checks
	- [x] Multi-turn conversation state grows correctly — 6 checks
	- [x] Client tensor↔list serialization roundtrip — 9 checks
	- [x] Edge cases: empty prompt, invalid inputs, unknown tables, tokenizer validation — 9 checks

	---

	## Known Issues Found

	1. `sqlalchemy` missing from `pyproject.toml` — The ORM models import `sqlalchemy` but it's not listed as a dependency. Must `uv add sqlalchemy` manually.

	2. Pydantic/TypedDict incompatibility on Python < 3.12 — The `openenv` library defines `Message` with `typing.TypedDict`, but Pydantic 2.x requires `typing_extensions.TypedDict`. The demo script monkey-patches this, but the issue would affect any direct usage.

	3. Ollama default model (`qwen2`) unlikely to be installed — The default `OLLAMA_MODEL` is `qwen2`, which users probably don't have. The 404 error is graceful but confusing. Consider defaulting to `llama3.2` or documenting the required model.

	4. describe/sample fallback to first table — When Ollama is unavailable, `_call_ollama_to_select_table()` silently falls back to the first table in the dict (`Address`). This is correct behavior but may confuse users expecting the table from their query.

	---

	## File Reference

	\| File \| What it does \|
	\|------\|-------------\|
	\| `envs/sql_env/demo_action_feature.py` \| Executable demo script (71 checks) \|
	\| `envs/sql_env/server/sql_environment.py` \| Core `SQLEnvironment` with reset/step/dispatch \|
	\| `envs/sql_env/models.py` \| `SQLAction`, `SQLObservation`, `SQLState` Pydantic models \|
	\| `envs/sql_env/client.py` \| `SQLEnvClient` with tensor↔list serialization \|
	\| `envs/sql_env/server/test_sql_env.py` \| `MockTokenizer` (char ord encoding) \|
	\| `envs/sql_env/data/databases/models.py` \| 9 SQLAlchemy ORM models \|

	---

	## How to Reproduce

	```bash
	git clone <repo-url>
	cd sql-env-onboarding
	git checkout origin/action-feature --detach
	cd envs/sql_env/
	uv sync && uv add sqlalchemy
	uv run python demo_action_feature.py # Mock path: 71/71 checks

	# Optional: Ollama path
	export OLLAMA_MODEL=llama3.2 # or any installed model
	uv run python demo_action_feature.py # Query actions now return real SQL
	```

	---

	Demo generated 2026-02-28. Re-run `uv run python demo_action_feature.py` on the action-feature branch to refresh.