Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /test /full-agentic-sandbox-matrix-report.md

NeerajCodz

docs: init proto

24f0bf0 about 2 months ago

preview code

raw

history blame contribute delete

3.58 kB

	# scraperl-full-agentic-sandbox-validation-report

	## scope

	Validated the end-to-end Docker flow (`docker compose up`) with backend/frontend integration, real scrape execution, agent/plugin orchestration, sandboxed Python execution, session artifacts, memory stats, and realtime stream events.

	## environment

	- Stack: `docker compose` (frontend `:3000`, backend `:8000`)
	- Build path validated after backend changes (TLS fallback, CSV detection fix, memory stats integration).
	- Providers exercised: NVIDIA and Groq.
	- Plugins exercised: search/browser/html/json + python sandbox (`proc-python`, `proc-pandas`, `proc-numpy`, `proc-bs4`).

	## critical-endpoint-smoke-checks-via-http-localhost-3000

	\| Endpoint \| Status \|
	\| --- \| --- \|
	\| `/api/health` \| 200 \|
	\| `/api/agents/list` \| 200 \|
	\| `/api/plugins` \| 200 \|
	\| `/api/memory/stats/overview` \| 200 \|
	\| `/api/settings` \| 200 \|
	\| `/api/agents/catalog` \| 200 \|
	\| `/api/agents/installed` \| 200 \|
	\| `/api/scrape/sessions` \| 200 \|

	## 10-real-scenario-results

	All scenarios completed successfully in the final run (10/10 completed, 0 partial, 0 failed).

	\| ID \| Provider \| Complexity \| Output \| Status \| Steps \| Reward \| URLs \| Sandbox Artifacts \|
	\| --- \| --- \| --- \| --- \| --- \| ---: \| ---: \| ---: \| ---: \|
	\| T1-low-nvidia-json \| nvidia \| low \| json \| completed \| 13 \| 4.8777 \| 1 \| 6 \|
	\| T2-medium-nvidia-markdown \| nvidia \| medium \| markdown \| completed \| 19 \| 7.3560 \| 1 \| 6 \|
	\| T3-high-nvidia-gold-csv \| nvidia \| high \| csv \| completed \| 50 \| 19.3423 \| 2 \| 8 \|
	\| T4-high-nvidia-python-analysis \| nvidia \| high \| json \| completed \| 30 \| 9.5663 \| 1 \| 6 \|
	\| T5-medium-nvidia-multiasset-csv \| nvidia \| medium \| csv \| completed \| 36 \| 14.5493 \| 2 \| 8 \|
	\| T6-low-groq-json \| groq \| low \| json \| completed \| 13 \| 4.8773 \| 1 \| 6 \|
	\| T7-high-groq-python \| groq \| high \| markdown \| completed \| 30 \| 9.5663 \| 1 \| 6 \|
	\| T8-medium-nvidia-memory-artifacts \| nvidia \| medium \| json \| completed \| 23 \| 7.3560 \| 1 \| 6 \|
	\| T9-high-nvidia-selected-agents \| nvidia \| high \| json \| completed \| 26 \| 9.6002 \| 1 \| 6 \|
	\| T10-stream-realtime \| nvidia \| medium \| json \| completed \| 19 \| 0.0000 \| 1 \| 0 \|

	## realtime-stream-validation

	- Stream test emitted: `init`, `step`, `url_start`, `url_complete`, `complete`.
	- Final stream status: `completed`.

	## memory-session-validation

	- Memory stats now reflect scrape writes (integrated with runtime memory manager).
	- Matrix run totals moved from 48 to 92 entries (short-term + long-term growth observed).
	- Isolated sanity check: memory totals changed from 0 to 4 after one memory-enabled scrape session.
	- Session sandbox artifacts are listable/readable through:
	- `GET /api/scrape/{session_id}/sandbox/files`
	- `GET /api/scrape/{session_id}/sandbox/files/{file_name}`

	## fixes-validated-during-this-cycle

	1. TLS/certificate fallback for web fetch in Dockerized runtime (with explicit warning and controlled retry).
	2. Correct navigation failure handling in scrape pipeline (no false-success navigation state).
	3. CSV detection corrected to avoid misclassifying HTML as CSV.
	4. Memory stats endpoint integrated with runtime memory manager counts.
	5. Agent catalog/install/uninstall API flow and frontend Agents tab routing integration.
	6. Backend and frontend test suites continue to pass after changes.

	## document-flow

	```mermaid
	flowchart TD
	A[document] --> B[key-sections]
	B --> C[implementation]
	B --> D[operations]
	B --> E[validation]
	```
	## related-api-reference

	\| item \| value \|
	\| --- \| --- \|
	\| api-reference \| `api-reference.md` \|