Spaces:
Running
SentinelOps Arena -- Build Plan
Overview
14-hour hackathon build plan for a multi-agent self-play RL environment on OpenEnv 0.2.1. Solo developer. Deadline: Sunday March 8, 2026 at 1:00 PM.
KEY INSIGHT: Innovation (40%) + Storytelling (30%) = 70% of judging is NON-code. Allocate time accordingly.
Revised Phase Summary
| Phase | File | Time | Cumulative | What |
|---|---|---|---|---|
| 0 | (inline) | 0.5h | 0-0.5h | Test H100/Northflank, write 60s video script |
| 1 | phase-1-models-and-systems.md | 3.5h | 0.5-4h | Pydantic models + enterprise system simulators |
| 2 | phase-2-environment-core.md | 2h | 4-6h | SentinelOpsArena(MCPEnvironment), rewards, turn management |
| 3 | phase-3-mcp-and-server.md | 0.5h | 6-6.5h | MCP tools via MCPEnvironment + HTTP server |
| 4 | phase-4-demo-and-ui.md | 2h | 6.5-8.5h | Demo script, Gradio app (1 tab), HF Spaces deploy |
| 5 | phase-5-training.md | 2h | 8.5-10.5h | Colab notebook, GRPO pipeline (fall back to SFT at 1.5h) |
| 6 | phase-6-polish-and-submit.md | 3.5h | 10.5-14h | Polish, video recording, submission |
Total: 14 hours
Phase 0: Pre-Flight (Hour 0-0.5)
Before writing any code:
- Test H100 via Northflank -- verify access, note available VRAM. If no H100, lock to Qwen2.5-1.5B.
- Write 60-second video script -- forces clarity on what to demo. Script drives the build.
- Set up repo structure -- create directories, pyproject.toml
Dependencies
Phase 0 (Pre-Flight)
|
v
Phase 1 (Models & Systems)
|
v
Phase 2 (Environment Core) -- CHECKPOINT 1 (Hour 6): Minimum Viable
|
v
Phase 3 (MCP + Server) -- MCPEnvironment handles this almost free
|
v
Phase 4 (Demo & UI) -- CHECKPOINT 2 (Hour 8.5): Deploy to HF Spaces
|
v
Phase 5 (Training) -- CHECKPOINT 3 (Hour 10.5): Strong Submission
|
v
Phase 6 (Polish & Submit) -- CHECKPOINT 4 (Hour 14): Full Submission
Stop-and-Submit Checkpoints
Hour 6 (after Phase 2): Environment works with random agents. Submit with basic demo + placeholder training notebook. Minimum viable.
Hour 8.5 (after Phase 4): Environment + MCP tools + Gradio demo deployed on HF Spaces. Good submission. INSURANCE SUBMISSION -- deploy to HF Spaces here.
Hour 10.5 (after Phase 5): Everything above + working Colab training pipeline with visible reward improvement. Strong submission.
Hour 14 (after Phase 6): Polished demo, training curves, video, stretch goals. Full submission.
Scoring Priorities
| Criterion | Weight | Primary Phase | Time Allocated |
|---|---|---|---|
| Innovation | 40% | Phases 1-2 (3-agent self-play architecture) | 5.5h |
| Storytelling | 30% | Phase 4 + 6 (Gradio demo + video) | 5.5h |
| Training Script | 20% | Phase 5 (Colab GRPO notebook) | 2h |
| Pipeline | 10% | Phase 3 (MCP integration) | 0.5h |
Key Technical Decisions
- OpenEnv version: 0.2.1 (stable,
openenv-core[core]>=0.2.0) - Base class:
MCPEnvironment(NOT rawEnvironment) -- auto-routesListToolsAction/CallToolActionto FastMCP server. Gives MCP tool discovery for free. - MCP-X gateway: CUT -- MCPEnvironment already handles MCP tool exposure. Per-agent isolation is nice-to-have, not needed.
- Action pattern:
Action(extra='forbid')-- all agent-specific fields must be Optional with defaults, or use separate action classes per role - Server:
create_app()fromopenenv.core.env_server.http_server - Training: Unsloth for model loading only, vanilla TRL
GRPOTrainerwithrollout_func. Fall back to SFT if GRPO fails at 1.5h. - Model: Qwen2.5-1.5B for Colab (5GB VRAM), Qwen2.5-7B if H100 available
- Demo: Gradio on HuggingFace Spaces
- Episode scope: 30 ticks, 15 customers, 15 invoices, 10 tickets, 30 tasks
- Attack types: 4 (schema drift, policy drift, social engineering, rate limiting)
- Reserved tool names:
reset,step,state,closeCANNOT be used as MCP tool names
File Structure
sentinelops_arena/
__init__.py
models.py # Pydantic models (enums, data, action/observation/state)
systems/
__init__.py
crm.py # CRM simulator
billing.py # Billing simulator
ticketing.py # Ticketing simulator
attacks.py # Attack mechanics (4 types)
rewards.py # Reward functions (3 agents)
task_generator.py # Task generation
environment.py # SentinelOpsArena(MCPEnvironment) -- MCP tools defined here
server.py # create_app() HTTP server
training/
colab_training.ipynb # Colab GRPO notebook (REQUIRED)
env_standalone.py # Standalone env for Colab (no openenv dependency)
app.py # HF Spaces Gradio entry point
pyproject.toml
README.md
NOTE: No separate mcp_tools.py -- MCP tools are defined inside environment.py using FastMCP, and MCPEnvironment auto-routes them.
NOTE: No mcp-x/ directory -- MCP-X gateway is CUT from the plan.
Partner Track Alignment
- Fleet AI (Scalable Oversight): The Oversight agent monitors, analyzes, and explains behavior of Worker agent
- Patronus AI (Schema Drift): Schema drift and policy drift are core attack types in the environment