Spaces:

openenv-community
/

Sentinel

Running

App Files Files Community

Sentinel / plan /README.md

nihalaninihal

Add phased build plan and setup guide for SentinelOps Arena

707377e 4 days ago

preview code

raw

history blame contribute delete

5.51 kB

	# SentinelOps Arena -- Build Plan

	## Overview

	14-hour hackathon build plan for a multi-agent self-play RL environment on OpenEnv 0.2.1. Solo developer. Deadline: Sunday March 8, 2026 at 1:00 PM.

	KEY INSIGHT: Innovation (40%) + Storytelling (30%) = 70% of judging is NON-code. Allocate time accordingly.

	## Revised Phase Summary

	\| Phase \| File \| Time \| Cumulative \| What \|
	\|-------\|------\|------\|------------\|------\|
	\| 0 \| (inline) \| 0.5h \| 0-0.5h \| Test H100/Northflank, write 60s video script \|
	\| 1 \| [phase-1-models-and-systems.md](phase-1-models-and-systems.md) \| 3.5h \| 0.5-4h \| Pydantic models + enterprise system simulators \|
	\| 2 \| [phase-2-environment-core.md](phase-2-environment-core.md) \| 2h \| 4-6h \| SentinelOpsArena(MCPEnvironment), rewards, turn management \|
	\| 3 \| [phase-3-mcp-and-server.md](phase-3-mcp-and-server.md) \| 0.5h \| 6-6.5h \| MCP tools via MCPEnvironment + HTTP server \|
	\| 4 \| [phase-4-demo-and-ui.md](phase-4-demo-and-ui.md) \| 2h \| 6.5-8.5h \| Demo script, Gradio app (1 tab), HF Spaces deploy \|
	\| 5 \| [phase-5-training.md](phase-5-training.md) \| 2h \| 8.5-10.5h \| Colab notebook, GRPO pipeline (fall back to SFT at 1.5h) \|
	\| 6 \| [phase-6-polish-and-submit.md](phase-6-polish-and-submit.md) \| 3.5h \| 10.5-14h \| Polish, video recording, submission \|

	Total: 14 hours

	## Phase 0: Pre-Flight (Hour 0-0.5)

	Before writing any code:
	1. Test H100 via Northflank -- verify access, note available VRAM. If no H100, lock to Qwen2.5-1.5B.
	2. Write 60-second video script -- forces clarity on what to demo. Script drives the build.
	3. Set up repo structure -- create directories, pyproject.toml

	## Dependencies

	```
	Phase 0 (Pre-Flight)
	\|
	v
	Phase 1 (Models & Systems)
	\|
	v
	Phase 2 (Environment Core) -- CHECKPOINT 1 (Hour 6): Minimum Viable
	\|
	v
	Phase 3 (MCP + Server) -- MCPEnvironment handles this almost free
	\|
	v
	Phase 4 (Demo & UI) -- CHECKPOINT 2 (Hour 8.5): Deploy to HF Spaces
	\|
	v
	Phase 5 (Training) -- CHECKPOINT 3 (Hour 10.5): Strong Submission
	\|
	v
	Phase 6 (Polish & Submit) -- CHECKPOINT 4 (Hour 14): Full Submission
	```

	## Stop-and-Submit Checkpoints

	Hour 6 (after Phase 2): Environment works with random agents. Submit with basic demo + placeholder training notebook. Minimum viable.

	Hour 8.5 (after Phase 4): Environment + MCP tools + Gradio demo deployed on HF Spaces. Good submission. INSURANCE SUBMISSION -- deploy to HF Spaces here.

	Hour 10.5 (after Phase 5): Everything above + working Colab training pipeline with visible reward improvement. Strong submission.

	Hour 14 (after Phase 6): Polished demo, training curves, video, stretch goals. Full submission.

	## Scoring Priorities

	\| Criterion \| Weight \| Primary Phase \| Time Allocated \|
	\|-----------\|--------\|---------------\|----------------\|
	\| Innovation \| 40% \| Phases 1-2 (3-agent self-play architecture) \| 5.5h \|
	\| Storytelling \| 30% \| Phase 4 + 6 (Gradio demo + video) \| 5.5h \|
	\| Training Script \| 20% \| Phase 5 (Colab GRPO notebook) \| 2h \|
	\| Pipeline \| 10% \| Phase 3 (MCP integration) \| 0.5h \|

	## Key Technical Decisions

	- OpenEnv version: 0.2.1 (stable, `openenv-core[core]>=0.2.0`)
	- Base class: `MCPEnvironment` (NOT raw `Environment`) -- auto-routes `ListToolsAction`/`CallToolAction` to FastMCP server. Gives MCP tool discovery for free.
	- MCP-X gateway: CUT -- MCPEnvironment already handles MCP tool exposure. Per-agent isolation is nice-to-have, not needed.
	- Action pattern: `Action(extra='forbid')` -- all agent-specific fields must be Optional with defaults, or use separate action classes per role
	- Server: `create_app()` from `openenv.core.env_server.http_server`
	- Training: Unsloth for model loading only, vanilla TRL `GRPOTrainer` with `rollout_func`. Fall back to SFT if GRPO fails at 1.5h.
	- Model: Qwen2.5-1.5B for Colab (5GB VRAM), Qwen2.5-7B if H100 available
	- Demo: Gradio on HuggingFace Spaces
	- Episode scope: 30 ticks, 15 customers, 15 invoices, 10 tickets, 30 tasks
	- Attack types: 4 (schema drift, policy drift, social engineering, rate limiting)
	- Reserved tool names: `reset`, `step`, `state`, `close` CANNOT be used as MCP tool names

	## File Structure

	```
	sentinelops_arena/
	__init__.py
	models.py # Pydantic models (enums, data, action/observation/state)
	systems/
	__init__.py
	crm.py # CRM simulator
	billing.py # Billing simulator
	ticketing.py # Ticketing simulator
	attacks.py # Attack mechanics (4 types)
	rewards.py # Reward functions (3 agents)
	task_generator.py # Task generation
	environment.py # SentinelOpsArena(MCPEnvironment) -- MCP tools defined here
	server.py # create_app() HTTP server

	training/
	colab_training.ipynb # Colab GRPO notebook (REQUIRED)
	env_standalone.py # Standalone env for Colab (no openenv dependency)

	app.py # HF Spaces Gradio entry point
	pyproject.toml
	README.md
	```

	NOTE: No separate `mcp_tools.py` -- MCP tools are defined inside `environment.py` using FastMCP, and `MCPEnvironment` auto-routes them.

	NOTE: No `mcp-x/` directory -- MCP-X gateway is CUT from the plan.

	## Partner Track Alignment

	- Fleet AI (Scalable Oversight): The Oversight agent monitors, analyzes, and explains behavior of Worker agent
	- Patronus AI (Schema Drift): Schema drift and policy drift are core attack types in the environment