scrapeRL / docs /agents.md
NeerajCodz's picture
docs: update
df47251
|
raw
history blame
4.15 kB
# Agents System Design
## Overview
The agent runtime is a multi-agent, memory-aware RL orchestration layer for web extraction tasks. It supports:
- Single-agent and multi-agent execution modes
- Strategy selection (`search-first`, `direct-extraction`, `multi-hop-reasoning`)
- Human-in-the-loop intervention
- Explainable decision traces
- Self-improvement from past episodes
## Agent Roles
### 1. Planner Agent
Builds a plan before action:
- Goal decomposition
- Tool selection plan
- Risk and fallback path
### 2. Navigator Agent
Explores pages and search results:
- URL prioritization
- Link traversal policy
- Page relevance scoring
### 3. Extractor Agent
Extracts structured fields:
- Selector and schema inference
- Adaptive chunk extraction
- Long-page batch processing
### 4. Verifier Agent
Checks consistency and trust:
- Cross-source verification
- Conflict resolution
- Confidence calibration
### 5. Memory Agent
Manages memory write/read/search:
- Episode summaries
- Pattern persistence
- Retrieval ranking and pruning
## Execution Modes
### Single-Agent
One policy handles all actions.
Pros: low overhead, simple.
Cons: weaker specialization.
### Multi-Agent
Coordinator delegates work:
1. Planner emits execution graph
2. Navigator discovers candidate pages
3. Extractor parses and emits data
4. Verifier validates outputs
5. Memory Agent stores reusable patterns
Pros: modular, robust, scalable.
Cons: coordination overhead.
## Agent Communication
Shared channels:
- `agent_messages`: async inter-agent messages
- `task_state`: current objective and progress
- `global_knowledge`: reusable facts and patterns
Message schema:
```json
{
"message_id": "msg_123",
"from": "navigator",
"to": "extractor",
"type": "page_candidate",
"payload": {
"url": "https://site.com/p/123",
"relevance": 0.91
},
"timestamp": "2026-03-27T00:00:00Z"
}
```
## Decision Policy
Policy input includes:
- Observation
- Working memory context
- Retrieved long-term memory hits
- Tool registry availability
- Budget and constraints
Policy output includes:
- Next action
- Confidence
- Rationale
- Fallback action (optional)
## Strategy Library
Built-in strategy templates:
- `search-first`: broad discovery then narrow extraction
- `direct-extraction`: immediate field extraction from target page
- `multi-hop-reasoning`: iterative search and verification
- `table-centric`: table-first parsing
- `form-centric`: forms and input structures prioritized
Strategy selection can be:
- Manual (user setting)
- Automatic (router based on task signature)
## Self-Improving Agent Loop
After each episode:
1. Compute reward breakdown
2. Extract failed and successful patterns
3. Update strategy performance table
4. Store high-confidence selectors in long-term memory
5. Penalize redundant navigation patterns
## Explainable AI Mode
Each action can emit:
- Why this action was chosen
- Why alternatives were rejected
- Which memory/tool evidence was used
Example trace:
```text
Action: EXTRACT_FIELD(price)
Why: Pattern "span.product-price" had 0.93 historical confidence on similar domains.
Alternatives rejected: ".price-box .value" (lower confidence 0.58), regex-only extraction (unstable on this layout).
```
## Human-in-the-Loop
Optional checkpoints:
- Approve/reject planned action
- Override selector/tool/model
- Force verification before submit
Intervention modes:
- `off`: fully autonomous
- `review`: pause on low-confidence steps
- `strict`: require approval on all submit/fetch/verify actions
## Scenario Simulator Hooks
Agents can be tested against:
- Noisy HTML
- Missing fields
- Broken pagination
- Adversarial layouts
- Dynamic content with delayed rendering
Simulation metrics:
- Completion
- Recovery score
- Generalization score
- Cost and latency
## APIs
- `POST /api/agents/run`
- `POST /api/agents/plan`
- `POST /api/agents/override`
- `GET /api/agents/state/{episode_id}`
- `GET /api/agents/trace/{episode_id}`
## Dashboard Widgets
- Live thought stream
- Agent role timeline
- Inter-agent message feed
- Strategy performance chart
- Confidence and override panel