| # Feature Demo: `action-feature` — Core Action Dispatch System |
|
|
| > **Generated:** 2026-02-28T14:46:19+01:00 |
| > **Branch:** `origin/action-feature` vs `main` |
| > **Project:** sql-env-onboarding (SQLEnv RL Environment for OpenEnv Challenge) |
| > **Execution environment:** Python 3.11, torch 2.2.2, MockTokenizer (no Ollama required) |
|
|
| --- |
|
|
| ## What This Feature Does |
|
|
| The `action-feature` branch adds the **core action dispatch system** to SQLEnv — the RL environment where AI agents learn interactive SQL exploration. Before this branch, the environment had data models but no way to actually process agent actions. |
|
|
| Now an agent can send natural language messages like _"describe the students table"_ or _"find all students enrolled in CS101"_, and the environment automatically classifies them into one of three action types (**describe**, **sample**, **query**), dispatches them to the appropriate handler, and returns observations with tokenized conversation history. This is the fundamental interaction loop that makes the environment usable for reinforcement learning. |
|
|
| The system works in two modes: |
| - **Mock path** (no external services): Uses `MockTokenizer` for tokenization. Describe and sample actions work fully; query actions return a graceful error since Ollama is unavailable. |
| - **Ollama path** (full pipeline): Uses a local Ollama LLM to select relevant tables for describe/sample and to generate SQL for query actions. |
|
|
| --- |
|
|
| ## Quickstart |
|
|
| ```bash |
| # 1. Checkout the branch |
| git checkout origin/action-feature --detach |
| |
| # 2. Install dependencies (from the sql_env package directory) |
| cd envs/sql_env/ |
| uv sync |
| uv add sqlalchemy # missing from pyproject.toml, needed for ORM models |
| |
| # 3. Run the full demo (71 checks, ~2 seconds, no external services needed) |
| uv run python demo_action_feature.py |
| |
| # 4. (Optional) Return to main when done |
| git checkout main |
| ``` |
|
|
| **Prerequisites:** Python 3.11+, `uv` package manager, git. |
| **Optional:** Ollama running locally with `llama3.2` model for full query generation (set `OLLAMA_MODEL=llama3.2`). |
|
|
| > **Note:** `sqlalchemy` is required by the ORM models but was omitted from `pyproject.toml` on the branch. The `uv add sqlalchemy` step is necessary. |
|
|
| > **Note:** On Python < 3.12, a Pydantic compatibility patch is needed because `openenv` defines `Message` with `typing.TypedDict` instead of `typing_extensions.TypedDict`. The demo script applies this patch automatically. |
| |
| --- |
| |
| ## Live Demo — Mock Path (Primary) |
| |
| All output below was captured by executing `uv run python demo_action_feature.py` on the `action-feature` branch with no Ollama model configured (default `qwen2` not installed). |
| |
| ### 1. Environment Instantiation with MockTokenizer |
| |
| The environment loads 9 SQLAlchemy ORM models (Address, Person, Student, Course, etc.) and initializes conversation state with a system prompt. |
| |
| ```bash |
| uv run python demo_action_feature.py |
| ``` |
| |
| ``` |
| ============================================================ |
| 1. Environment Instantiation with MockTokenizer |
| ============================================================ |
| [PASS] MockTokenizer created |
| [PASS] SQLEnvironment created |
| [PASS] System prompt stored |
| [PASS] Tokenizer stored |
| [PASS] 9 database models loaded |
| [PASS] Initial state has 1 message (system) |
| [PASS] Initial state has 1 token tensor |
| [PASS] System message role is correct |
| [PASS] System message content matches prompt |
| ``` |
| |
| The environment correctly stores the custom system prompt, attaches the tokenizer, and loads all 9 database table models from SQLAlchemy. |
| |
| ### 2. reset() Returns Valid SQLObservation |
| |
| ``` |
| ============================================================ |
| 2. reset() Returns Valid SQLObservation |
| ============================================================ |
| [PASS] reset() returns SQLObservation |
| [PASS] Observation has messages list |
| [PASS] Messages contain system prompt |
| [PASS] Observation has tokens tensor |
| [PASS] Tokens tensor is 1D |
| [PASS] Tokens are non-empty |
| [PASS] done is False |
| [PASS] reward is None |
| |
| Observation details: |
| messages count: 1 |
| tokens shape: torch.Size([29]) |
| tokens[:10]: [89, 111, 117, 32, 97, 114, 101, 32, 97, 32] |
| ``` |
| |
| After `reset()`, the observation contains one message (the system prompt) tokenized into a 1D tensor via `MockTokenizer` (char codes mod 256). The episode is not done and has no reward — ready for the agent's first action. |
| |
| ### 3. Action Type Detection |
| |
| The keyword classifier maps user messages to three action types: `describe`, `sample`, and `query`. |
| |
| ``` |
| ============================================================ |
| 3. Action Type Detection (_detect_action_type) |
| ============================================================ |
| [PASS] 'describe the students table...' -> describe |
| [PASS] 'what columns does Course have...' -> describe |
| [PASS] 'show me the schema...' -> describe |
| [PASS] 'show me sample rows from students...' -> sample |
| [PASS] 'give me example data...' -> sample |
| [PASS] 'how many rows are in Courses...' -> sample |
| [PASS] 'find all students enrolled in CS101...' -> query |
| [PASS] 'select count(*) from students...' -> query |
| [PASS] 'what is the average score...' -> query |
| ``` |
| |
| All 9 test cases correctly classified. Keywords like "describe"/"schema"/"columns" trigger describe; "sample"/"example"/"rows" trigger sample; everything else defaults to query. |
| |
| ### 4. message_to_action() Creates Properly Typed SQLAction |
| |
| ``` |
| ============================================================ |
| 4. message_to_action() Creates Properly Typed SQLAction |
| ============================================================ |
| [PASS] Returns SQLAction |
| [PASS] action_type is 'describe' |
| [PASS] action_description is message content |
| [PASS] tokens is a torch.Tensor |
| [PASS] tokens tensor is non-empty |
| |
| Action details: |
| action_type: describe |
| action_description: describe the students table |
| tokens shape: torch.Size([1, 57]) |
| [PASS] message_to_action adds message to history |
| [PASS] History[1] is the user message |
| [PASS] Sample message -> action_type 'sample' |
| [PASS] Query message -> action_type 'query' |
| ``` |
| |
| `message_to_action()` converts a raw message dict into a `SQLAction` with the correct `action_type`, `action_description`, and tokenized tensor. **Important side effect:** it also appends the message to `_state.history_messages` before tokenizing, so the tokenizer sees the full conversation context. |
| |
| ### 5. step() with Describe Action |
| |
| Without Ollama, the describe action falls back to the first table (Address) and returns its full SQLAlchemy-derived schema. |
| |
| ``` |
| ============================================================ |
| 5. step() with Describe Action (Schema from SQLAlchemy Models) |
| ============================================================ |
| [PASS] step() returns SQLObservation |
| [PASS] History now has 3 messages (system + user + assistant) |
| [PASS] Last message is from assistant |
| [PASS] Assistant message contains 'columns' |
| [PASS] Schema info contains column descriptions |
| |
| Describe response (first 200 chars): |
| Table 'Address' has the following columns: |
| |
| - address_id: integer number |
| - line_1: text (up to 255 characters) |
| - line_2: text (up to 255 characters) |
| - city: text (up to 255 characters) |
| - zip_postcode: |
| [PASS] Tokens tensor grew after step |
| ``` |
| |
| The schema is extracted directly from SQLAlchemy model introspection (column names, types converted to natural language). The observation now has 3 messages (system → user → assistant) and the token tensor grew. |
| |
| ### 6. step() with Sample Action |
| |
| The sample action generates a `SELECT * ... LIMIT 10` query for the target table. |
|
|
| ``` |
| ============================================================ |
| 6. step() with Sample Action (Generates SQL Query Text) |
| ============================================================ |
| [PASS] step(sample) returns assistant message |
| [PASS] Sample response contains SELECT |
| [PASS] Sample response contains LIMIT |
|
|
| Sample response: |
| Here's a query to sample data from Address: |
| |
| SELECT * FROM Address LIMIT 10; |
| ``` |
| |
| ### 7. step() with Query Action (No Ollama) |
| |
| Without Ollama, the query action returns a clear error message instead of crashing. |
| |
| ``` |
| ============================================================ |
| 7. step() with Query Action (Mock Path — No Ollama) |
| ============================================================ |
| [PASS] step(query) returns assistant message |
| [PASS] Query response is error string (no Ollama) or SQL |
|
|
| Query response (no Ollama): |
| Error: Ollama returned status 404 |
| ``` |
| |
| The error is a graceful 404 (Ollama server is running but the default `qwen2` model isn't installed). The conversation continues normally — the error becomes part of the message history. |
|
|
| ### 8. Multi-Turn Conversation State Management |
|
|
| Three turns of alternating user/assistant messages, verifying the conversation history grows correctly. |
|
|
| ``` |
| ============================================================ |
| 8. Multi-Turn Conversation State Management |
| ============================================================ |
| [PASS] After turn 1: 3 messages (sys + user + assistant) |
| [PASS] After turn 2: 5 messages (sys + u1 + a1 + u2 + a2) |
| [PASS] Tokens grew between turns |
| [PASS] After turn 3: 7 messages |
| [PASS] Tokens grew again |
| [PASS] Message roles follow expected pattern |
|
|
| Conversation summary after 3 turns: |
| [0] system: You are a test SQL assistant.... |
| [1] user: describe the Address table... |
| [2] assistant: Table 'Address' has the following columns: - address_id: in... |
| [3] user: show me sample rows... |
| [4] assistant: Here's a query to sample data from Address: SELECT * FROM A... |
| [5] user: find all addresses in New York... |
| [6] assistant: Error: Ollama returned status 404... |
| Total tokens: 987 |
| ``` |
| |
| Message roles follow the expected `[system, user, assistant, user, assistant, user, assistant]` pattern. Token count grows monotonically: the `_create_observation()` method flattens all `history_tokens` into a single 1D tensor via `torch.cat`. |
|
|
| ### 9. Client Serialization Roundtrip |
|
|
| The `SQLEnvClient` converts tensor → list for JSON transport and list → tensor on the return path. |
|
|
| ``` |
| ============================================================ |
| 9. Client Serialization Roundtrip (_step_payload) |
| ============================================================ |
| [PASS] Payload has action_type |
| [PASS] Payload has action_description |
| [PASS] Tokens converted to list |
| [PASS] Token values preserved |
| [PASS] _parse_result returns StepResult |
| [PASS] Observation messages parsed |
| [PASS] Tokens converted back to tensor |
| [PASS] Token values correct |
| [PASS] Reward parsed |
|
|
| Payload serialization: |
| action_type: query |
| tokens (list): [[1, 2, 3, 4, 5]] |
| ``` |
| |
| --- |
|
|
| ## Edge Cases Exercised |
|
|
| ### Empty System Prompt |
|
|
| When no system prompt is provided (empty string), the environment correctly starts with zero messages and an empty token tensor. |
|
|
| ``` |
| [PASS] Empty system prompt -> no messages in history |
| [PASS] Empty system prompt -> empty tokens |
| ``` |
|
|
| ### Invalid Message Inputs |
|
|
| `message_to_action()` validates its input and raises `ValueError` for malformed messages. |
|
|
| ``` |
| [PASS] Missing 'role' raises ValueError |
| [PASS] Missing 'content' raises ValueError |
| [PASS] None content raises ValueError |
| ``` |
|
|
| ### Unknown Table Handling |
|
|
| Schema lookup and sample query generation gracefully handle non-existent tables. |
|
|
| ``` |
| [PASS] Unknown table returns 'not found' message |
| [PASS] Unknown table sample returns error |
| ``` |
|
|
| ### MockTokenizer Encode/Decode Roundtrip |
|
|
| The mock tokenizer's `ord(c) % 256` encoding correctly roundtrips through `chr(t)` decoding. |
|
|
| ``` |
| [PASS] MockTokenizer encode/decode roundtrip |
| ``` |
|
|
| ### Invalid Tokenizer Validation |
|
|
| The environment constructor rejects tokenizers missing `apply_chat_template`. |
|
|
| ``` |
| [PASS] Invalid tokenizer raises ValueError |
| ``` |
|
|
| --- |
|
|
| ## Live Demo — Ollama Path (Optional) |
|
|
| When Ollama is running locally with a compatible model, the query action generates real SQL and the describe action selects the correct table. |
|
|
| ### Describe with Ollama |
|
|
| With `OLLAMA_MODEL=llama3.2`, the LLM correctly identifies "Student" as the most relevant table for "describe the students table": |
|
|
| ``` |
| DESCRIBE RESULT: |
| Table 'Student' has the following columns: |
| |
| - student_id: integer number |
| - student_details: text (up to 255 characters) |
| ``` |
|
|
| Compare with mock path: fell back to "Address" (first table in dict). **With Ollama, table selection is intelligent.** |
|
|
| ### Query with Ollama |
|
|
| The LLM generates valid SQL for natural language questions: |
|
|
| ``` |
| QUERY RESULT: |
| SELECT * FROM Students WHERE CourseID IN (SELECT CourseID FROM Courses WHERE CourseName = 'CS101') |
| ``` |
|
|
| > **Note:** The generated SQL references column names from the schema description prompt, not the actual SQLAlchemy model column names. This is expected — the LLM generates SQL based on the natural language schema it receives. |
|
|
| --- |
|
|
| ## Full Result Summary |
|
|
| ``` |
| ============================================================ |
| SUMMARY |
| ============================================================ |
| Total checks: 71 |
| Passed: 71 |
| Failed: 0 |
|
|
| ALL CHECKS PASSED |
| ``` |
| |
| | Category | Checks | Passed | Failed | |
| |----------|--------|--------|--------| |
| | Imports | 1 | 1 | 0 | |
| | Instantiation | 8 | 8 | 0 | |
| | reset() | 8 | 8 | 0 | |
| | Action detection | 9 | 9 | 0 | |
| | message_to_action | 9 | 9 | 0 | |
| | step(describe) | 6 | 6 | 0 | |
| | step(sample) | 3 | 3 | 0 | |
| | step(query) | 2 | 2 | 0 | |
| | Multi-turn state | 6 | 6 | 0 | |
| | Client serialization | 9 | 9 | 0 | |
| | Edge cases | 9 | 9 | 0 | |
| | **Total** | **71** | **71** | **0** | |
| |
| --- |
| |
| ## Verification Checklist |
| |
| - [x] Environment instantiation with MockTokenizer — 8 checks |
| - [x] `reset()` returns valid SQLObservation with system prompt — 8 checks |
| - [x] Action type detection for all 3 types (describe/sample/query) — 9 keywords tested |
| - [x] `message_to_action()` creates SQLAction with correct type and tokens — 9 checks |
| - [x] `step()` with describe returns schema from SQLAlchemy models — 6 checks |
| - [x] `step()` with sample returns SQL query text — 3 checks |
| - [x] `step()` with query returns Ollama error gracefully (mock path) — 2 checks |
| - [x] Multi-turn conversation state grows correctly — 6 checks |
| - [x] Client tensor↔list serialization roundtrip — 9 checks |
| - [x] Edge cases: empty prompt, invalid inputs, unknown tables, tokenizer validation — 9 checks |
| |
| --- |
| |
| ## Known Issues Found |
| |
| 1. **`sqlalchemy` missing from `pyproject.toml`** — The ORM models import `sqlalchemy` but it's not listed as a dependency. Must `uv add sqlalchemy` manually. |
| |
| 2. **Pydantic/TypedDict incompatibility on Python < 3.12** — The `openenv` library defines `Message` with `typing.TypedDict`, but Pydantic 2.x requires `typing_extensions.TypedDict`. The demo script monkey-patches this, but the issue would affect any direct usage. |
| |
| 3. **Ollama default model (`qwen2`) unlikely to be installed** — The default `OLLAMA_MODEL` is `qwen2`, which users probably don't have. The 404 error is graceful but confusing. Consider defaulting to `llama3.2` or documenting the required model. |
| |
| 4. **describe/sample fallback to first table** — When Ollama is unavailable, `_call_ollama_to_select_table()` silently falls back to the first table in the dict (`Address`). This is correct behavior but may confuse users expecting the table from their query. |
| |
| --- |
| |
| ## File Reference |
| |
| | File | What it does | |
| |------|-------------| |
| | `envs/sql_env/demo_action_feature.py` | Executable demo script (71 checks) | |
| | `envs/sql_env/server/sql_environment.py` | Core `SQLEnvironment` with reset/step/dispatch | |
| | `envs/sql_env/models.py` | `SQLAction`, `SQLObservation`, `SQLState` Pydantic models | |
| | `envs/sql_env/client.py` | `SQLEnvClient` with tensor↔list serialization | |
| | `envs/sql_env/server/test_sql_env.py` | `MockTokenizer` (char ord encoding) | |
| | `envs/sql_env/data/databases/models.py` | 9 SQLAlchemy ORM models | |
| |
| --- |
| |
| ## How to Reproduce |
| |
| ```bash |
| git clone <repo-url> |
| cd sql-env-onboarding |
| git checkout origin/action-feature --detach |
| cd envs/sql_env/ |
| uv sync && uv add sqlalchemy |
| uv run python demo_action_feature.py # Mock path: 71/71 checks |
| |
| # Optional: Ollama path |
| export OLLAMA_MODEL=llama3.2 # or any installed model |
| uv run python demo_action_feature.py # Query actions now return real SQL |
| ``` |
| |
| --- |
| |
| *Demo generated 2026-02-28. Re-run `uv run python demo_action_feature.py` on the action-feature branch to refresh.* |
| |