Spaces:

openenv-community
/

Sentinel

Running

App Files Files Community

Sentinel / plan /phase-3-mcp-and-server.md

nihalaninihal

Refine build plan with devil's advocate corrections

dc8bc66 4 days ago

preview code

raw

history blame contribute delete

7.24 kB

	# Phase 3: MCP + OpenEnv HTTP Server

	Time: 0.5 hours (Hours 6-6.5)
	Priority: MEDIUM -- MCPEnvironment did most of the work in Phase 2
	Depends on: Phase 2 (working environment with MCP tools)

	KEY CHANGE: MCPEnvironment handles MCP tool routing automatically. Phase 3 is now just creating the HTTP server entry point and verifying everything works end-to-end. MCP-X gateway is CUT.

	---

	## Files to Create

	\| File \| Purpose \| Est. Time \|
	\|------\|---------\|-----------\|
	\| `sentinelops_arena/server.py` \| `create_app()` HTTP server entry point \| 10 min \|
	\| Verify MCP tools via HTTP \| End-to-end test \| 10 min \|
	\| Verify WebSocket + MCP \| Integration test \| 10 min \|

	---

	## Step-by-Step Build Instructions

	### Step 1: server.py -- OpenEnv HTTP Server (10 min)

	This is trivial -- follow the hackathon_env template exactly.

	```python
	# sentinelops_arena/server.py
	"""
	HTTP server for SentinelOps Arena.

	Endpoints:
	POST /reset -- Reset environment
	POST /step -- Execute an action (including ListToolsAction, CallToolAction)
	GET /state -- Get current state
	GET /schema -- Get action/observation schemas
	WS /ws -- WebSocket for persistent sessions (supports /mcp)

	The MCPEnvironment base class handles MCP tool routing automatically.
	Agents can discover tools via ListToolsAction and call them via CallToolAction.

	Usage:
	uvicorn sentinelops_arena.server:app --host 0.0.0.0 --port 8000
	"""

	from openenv.core.env_server.http_server import create_app
	from .models import SentinelAction, SentinelObservation
	from .environment import SentinelOpsArena

	app = create_app(
	SentinelOpsArena,
	SentinelAction,
	SentinelObservation,
	env_name="sentinelops_arena",
	max_concurrent_envs=5,
	)

	def main(host: str = "0.0.0.0", port: int = 8000):
	import uvicorn
	uvicorn.run(app, host=host, port=port)

	if __name__ == "__main__":
	import argparse
	parser = argparse.ArgumentParser()
	parser.add_argument("--port", type=int, default=8000)
	args = parser.parse_args()
	main(port=args.port)
	```

	### Step 2: Verify HTTP + MCP Integration (10 min)

	```bash
	# Start server
	uvicorn sentinelops_arena.server:app --port 8000 &

	# Test reset
	curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{}'

	# Test step (regular action)
	curl -X POST http://localhost:8000/step -H "Content-Type: application/json" \
	-d '{"action": {"agent": "attacker", "action_type": "pass"}}'

	# Test step (MCP list_tools -- auto-routed by MCPEnvironment)
	curl -X POST http://localhost:8000/step -H "Content-Type: application/json" \
	-d '{"action": {"type": "list_tools"}}'
	# Should return available MCP tools

	# Test step (MCP call_tool -- auto-routed by MCPEnvironment)
	curl -X POST http://localhost:8000/step -H "Content-Type: application/json" \
	-d '{"action": {"type": "call_tool", "tool_name": "lookup_customer", "arguments": {"customer_id": "C000"}}}'
	# Should return customer data

	# Test state
	curl http://localhost:8000/state

	# Test schema
	curl http://localhost:8000/schema

	kill %1
	```

	### Step 3: Verify WebSocket MCP Path (10 min)

	```python
	# Quick WebSocket test
	import asyncio
	import json
	import websockets

	async def test_ws():
	async with websockets.connect("ws://localhost:8000/ws") as ws:
	# Reset
	await ws.send(json.dumps({"type": "reset", "data": {"seed": 42}}))
	resp = json.loads(await ws.recv())
	print(f"Reset: {resp['type']}")

	# MCP via WebSocket
	await ws.send(json.dumps({
	"type": "mcp",
	"data": {"method": "tools/list", "params": {}, "id": 1}
	}))
	resp = json.loads(await ws.recv())
	print(f"MCP tools via WS: {resp}")

	asyncio.run(test_ws())
	```

	---

	## What MCPEnvironment Gives Us For Free

	\| Feature \| How \|
	\|---------\|-----\|
	\| MCP tool discovery \| `ListToolsAction` -> returns all tools with schemas \|
	\| MCP tool invocation \| `CallToolAction(tool_name, arguments)` -> calls FastMCP tool \|
	\| Reserved name validation \| Rejects tools named `reset`, `step`, `state`, `close` \|
	\| Timeout handling \| Configurable timeout on tool calls \|
	\| Error categorization \| `ToolError` with types: execution_error, invalid_args, tool_not_found, timeout \|
	\| WebSocket MCP path \| `/ws` endpoint supports `type: "mcp"` messages \|
	\| Async support \| `_run_async_safely()` handles both sync and async contexts \|

	## What We DON'T Need (CUT)

	\| Removed \| Reason \|
	\|---------\|--------\|
	\| `mcp_tools.py` \| MCP tools defined inside `environment.py` via FastMCP \|
	\| `mcp-x/` directory \| MCP-X gateway CUT -- MCPEnvironment handles tool exposure \|
	\| `config.toml` \| No MCP-X = no per-agent access control config \|
	\| `run_server.py` \| Single server is enough \|
	\| Per-agent JWT tokens \| Nice-to-have, not needed for demo/judging \|

	---

	## VERIFY

	### Test 1: HTTP Server starts
	```bash
	uvicorn sentinelops_arena.server:app --port 8000
	# Should start without errors
	# Should show "Uvicorn running on http://0.0.0.0:8000"
	```

	### Test 2: All endpoints return valid JSON
	```bash
	# Reset -> Observation JSON
	# Step -> Observation JSON
	# State -> State JSON
	# Schema -> Action/Observation/State schemas
	```

	### Test 3: MCP tools discoverable via HTTP
	```bash
	# POST /step with ListToolsAction -> list of tools
	# Verify: lookup_customer, issue_refund, get_schema, launch_attack etc. all present
	# Verify: no reserved names (reset, step, state, close)
	```

	### Test 4: MCP tools callable via HTTP
	```bash
	# POST /step with CallToolAction -> tool result
	# Call lookup_customer("C000") -> customer data
	# Call get_schema("crm") -> field list
	# Call get_current_policy("refund") -> policy values
	```

	---

	## DEBUG: Common Issues

	\| Issue \| Cause \| Fix \|
	\|-------\|-------\|-----\|
	\| `Port 8000 already in use` \| Previous server running \| `kill $(lsof -t -i:8000)` \|
	\| `create_app()` fails with type error \| Wrong argument types \| Pass class (not instance), Action class, Observation class \|
	\| MCP tools not showing up \| Tools defined after `super().__init__()` \| Define tools BEFORE calling `super().__init__(mcp)` \|
	\| `ValueError: reserved names` \| Tool named `reset` or `step` \| Rename the tool \|
	\| WebSocket MCP not working \| Wrong message format \| Use `{"type": "mcp", "data": {"method": "tools/list", ...}}` \|
	\| `ListToolsAction` not recognized \| `create_app` doesn't know about MCP types \| May need to pass both `SentinelAction` and MCP action types to create_app \|

	---

	## EXIT CRITERIA

	- [ ] `uvicorn sentinelops_arena.server:app` starts without errors
	- [ ] HTTP `/reset`, `/step`, `/state`, `/schema` return valid JSON
	- [ ] `ListToolsAction` via `/step` returns all enterprise system tools
	- [ ] `CallToolAction` via `/step` successfully calls tools
	- [ ] WebSocket `/ws` endpoint accepts connections

	---

	## ROLLBACK PLAN

	Phase 3 is already minimal. If it takes longer than 30 minutes:
	1. Skip WebSocket verification -- HTTP-only is fine for demo
	2. Skip schema endpoint check -- not needed for judging
	3. If `create_app()` fails entirely -- serve the Gradio app directly without the OpenEnv HTTP layer. The environment still works via direct Python calls.

	Do NOT cut: `server.py` with `create_app()`. This is required for HF Spaces deployment.