File size: 5,041 Bytes
24f0bf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# tool-calls

## stream-event-overview

Tool calls are surfaced through scrape streaming events (`/api/scrape/stream`) as `step` payloads.

| event-type | purpose | contains-tool-call-data |
| --- | --- | --- |
| `init` | stream/session initialization | no |
| `url_start` | url processing started | no |
| `step` | progress/action update | yes (for `action=tool_call` and `action=agent_decision`) |
| `url_complete` | url processing complete | no |
| `complete` | final response payload | no (aggregated output only) |
| `error` | runtime error surface | optional |

## scrape-step-schema

`step` events are based on the `ScrapeStep` model.

| field | type | description |
| --- | --- | --- |
| `step_number` | integer | sequence index in the session |
| `action` | string | logical action type (`tool_call`, `agent_decision`, `plugins`, etc.) |
| `url` | string or null | active url for this step when available |
| `status` | string | runtime state (`running`, `complete`, `completed`, `failed`, etc.) |
| `message` | string | short human-readable step summary |
| `reward` | number | reward delta for this step |
| `extracted_data` | object or null | structured details, including tool payloads |
| `duration_ms` | number or null | optional elapsed time for the step |
| `timestamp` | string | utc iso timestamp |

## tool-call-payload-patterns

### pattern-a-registry-helper-calls

Used by `_create_tool_call_step(...)`.

| key-path | value-shape |
| --- | --- |
| `extracted_data.tool_name` | `namespace.action` |
| `extracted_data.tool_description` | short description |
| `extracted_data.parameters` | argument object |
| `extracted_data.result` | optional result object |

### pattern-b-runtime-agent-planner-and-executor

Used by dynamic runtime tool-calling in agentic scrape flow.

| action | key-path | value-shape |
| --- | --- | --- |
| `agent_decision` | `extracted_data.tool_calls[]` | `tool`, `params`, `reasoning` |
| `tool_call` | `extracted_data.tool` | selected tool name |
| `tool_call` | `extracted_data.success` | boolean execution state |
| `tool_call` | `extracted_data.result_preview` | compact serialized result |
| `tool_call` | `extracted_data.error` | error message if failed |
| `tool_call` | `extracted_data.duration_ms` | execution duration |

## runtime-tool-call-lifecycle

```mermaid
sequenceDiagram
    participant Client as scrape-client
    participant Route as scrape-route
    participant Planner as agent-tool-caller
    participant Executor as tool-executor

    Client->>Route: POST /api/scrape/stream
    Route->>Planner: decide_tools(context, model)
    Planner-->>Route: [tool-call-plan]
    Route-->>Client: step(action=agent_decision)
    loop each selected tool
        Route->>Executor: execute_tool_call(tool, context)
        Executor-->>Route: ToolCallResult
        Route-->>Client: step(action=tool_call)
    end
    Route-->>Client: complete(output, extracted_data, metadata)
```

## field-order-and-rendering-guidance

Frontend and log consumers should parse structured fields, not message text.

| consumer-surface | recommendation |
| --- | --- |
| timeline ui | group by `action`, then read `extracted_data` keys |
| tool call panel | prefer `tool_name`/`tool` over `message` |
| analytics | aggregate by `tool_name`/`tool` and `success` |
| debugging | use `result_preview` and `error` first, full context second |

## example-step-events

```json
{
  "type": "step",
  "data": {
    "step_number": 17,
    "action": "agent_decision",
    "status": "completed",
    "message": "Agent selected 4 runtime tools",
    "reward": 0.1,
    "extracted_data": {
      "tool_calls": [
        {"tool": "html.select", "params": {"selector": "article", "limit": 20}, "reasoning": "Find repeated blocks"},
        {"tool": "extract.top_n", "params": {"n": 10}, "reasoning": "Apply output size cap"}
      ]
    },
    "timestamp": "2026-04-08T11:49:20.000000+00:00"
  }
}
```

```json
{
  "type": "step",
  "data": {
    "step_number": 18,
    "action": "tool_call",
    "status": "completed",
    "message": "Tool html.select: ok",
    "reward": 0.05,
    "extracted_data": {
      "tool": "html.select",
      "success": true,
      "result_preview": "{'elements_found': 12, 'selector_used': 'article'}",
      "error": null,
      "duration_ms": 3
    },
    "timestamp": "2026-04-08T11:49:20.005000+00:00"
  }
}
```

## troubleshooting-table

| symptom | likely-cause | check |
| --- | --- | --- |
| `agent_decision` absent | planner disabled or failed before plan emit | verify `live_llm_enabled` path and planner warnings |
| selected tools not executed | planner output filtered/empty | inspect selected tool names against registry |
| many failed tool calls | unsupported namespace or bad params | verify executor namespace handlers and args |
| output quality unchanged | tool observations not influencing extraction | verify `AGENT TOOL OBSERVATIONS` injected in extraction prompt |
## related-api-reference

| item | value |
| --- | --- |
| api-reference | `api-reference.md` |