# EV2 Migration Plan: From Wrapper to Standalone ## 🎯 目标 将 `ev2.py` 的逻辑完全迁移到 `ev2_service_standalone.py`,创建一个独立的、完整的评估服务。 **设计原则**: 1. ✅ 不依赖 `ev2.py`(完全独立) 2. ✅ 保留所有现有功能 3. ✅ 为未来扩展做准备(MetricUnit、Lifecycle 等) 4. ✅ 更清晰的架构和状态管理 --- ## 📊 当前架构 vs 目标架构 ### 当前架构(ev2_service.py) ``` ev2_service.py (HTTP wrapper) ↓ 调用 ev2.py (Agent logic) ↓ 使用 OpenHands Agent ``` **问题**: - 两层抽象,状态分散 - 不利于深度集成 - ev2.py 的设计假设单次运行 ### 目标架构(ev2_service_standalone.py) ``` ev2_service_standalone.py ├── FastAPI HTTP Server ├── ServiceState (持久化状态管理) ├── IntegratedEV2Agent (直接管理 OpenHands) │ ├── Agent instance (持久化) │ ├── Memory management │ └── Conversation history └── MetricRegistry (可选,为未来准备) ``` **优势**: - 单一职责,逻辑集中 - Agent 持久化,无需每次重建 - 更好的状态管理 - 为 MetricUnit 等高级功能铺路 --- ## 🔧 迁移步骤 ### Phase 1: 核心 Agent 类(优先级最高) **目标**:创建 `IntegratedEV2Agent` 类,替代对 `evolution_evaluation_agent()` 的调用 #### 1.1 创建 Agent 管理类 ```python # ev2_service_standalone.py from openhands.agent import Agent from openhands.llm import LLM from openhands.tools import TerminalTool, FileEditorTool, TaskTrackerTool from openhands.tools.tool import Tool class IntegratedEV2Agent: """ Integrated EV2 Agent (not a wrapper) Directly manages OpenHands agent lifecycle and state """ def __init__(self, results_dir: str, primary_evaluator_path: str, config: Dict[str, Any]): self.results_dir = Path(results_dir).resolve() self.primary_evaluator_path = Path(primary_evaluator_path).resolve() self.config = config # Memory directory (persistent) self.memory_dir = self.results_dir / "eval_agent_memory" self.memory_dir.mkdir(parents=True, exist_ok=True) # Initialize OpenHands agent (persistent!) self.agent = self._create_agent() # Conversation history (accumulates across generations) self.conversation_history = [] logger.info(f"✅ IntegratedEV2Agent initialized") logger.info(f" Memory dir: {self.memory_dir}") logger.info(f" Primary evaluator: {self.primary_evaluator_path}") def _create_agent(self) -> Agent: """ Create OpenHands agent Migrated from ev2.py:evolution_evaluation_agent() """ # LLM setup llm = LLM(model="anthropic/claude-sonnet-4-20250514") # System prompt prompt_path = Path(__file__).parent / "ev2_prompt.j2" if not prompt_path.exists(): raise FileNotFoundError(f"Prompt template not found: {prompt_path}") # Create agent with tools agent = Agent( llm=llm, tools=[ Tool(name=TerminalTool.name), Tool(name=FileEditorTool.name), Tool(name=TaskTrackerTool.name), ], system_prompt_filename=str(prompt_path), ) return agent async def analyze_generation(self, generation: int) -> Dict[str, Any]: """ Analyze a generation This is the main entry point, replacing evolution_evaluation_agent() """ logger.info(f"🧠 Analyzing generation {generation}...") # Build task message task = self._build_task_message(generation) # Run agent result = await self._run_agent(task) # Extract results insights = self._extract_insights() metrics = self._extract_metrics() return { "success": True, "insights": insights, "auxiliary_metrics": metrics, "generation": generation } def _build_task_message(self, generation: int) -> str: """ Build task message for agent Migrated from ev2.py:_build_default_task() """ # Read primary evaluator code primary_code = "" if self.primary_evaluator_path.exists(): primary_code = self.primary_evaluator_path.read_text() # Check for generation directory gen_dir = self._find_generation_dir(generation) task = f"""# Evolution Evaluation Task - Generation {generation} ## Your Mission You are analyzing the evolution process for a code optimization task. Your workspace is: `{self.memory_dir}` ## Current Generation Generation: {generation} Results directory: {gen_dir if gen_dir else 'Not found'} ## Primary Evaluator (Fixed, DO NOT MODIFY) The ground truth evaluation is defined in: `{self.primary_evaluator_path}` **CRITICAL**: You MUST NOT modify this file. Read it to understand the primary objective. ## Your Tasks 1. **READ** the primary evaluator to understand the ground truth objective 2. **ANALYZE** the current generation's performance and strategy 3. **CREATE** auxiliary evaluation metrics that provide insights beyond the primary score 4. **UPDATE** EVAL_AGENTS.md with your findings and recommendations ## Workspace Structure Your workspace (`{self.memory_dir}`) should contain: - `EVAL_AGENTS.md`: Your accumulated insights and analysis - `auxiliary_metrics.py`: Python code for auxiliary metrics - Any other analysis files you create ## Constraints - Primary metric is FIXED - you cannot change it - Auxiliary metrics should complement, not replace, the primary metric - Focus on actionable insights that can guide the evolution process ## Output Update EVAL_AGENTS.md with: - Analysis of generation {generation} - Auxiliary metric definitions and values - Insights and recommendations for future generations Begin your analysis! """ return task def _find_generation_dir(self, generation: int) -> Optional[Path]: """Find the generation directory""" # Try common patterns patterns = [ self.results_dir / f"gen_{generation}", self.results_dir.parent / f"gen_{generation}", ] for pattern in patterns: if pattern.exists(): return pattern return None async def _run_agent(self, task: str) -> Dict[str, Any]: """ Run the agent with a task This is where we'd integrate async execution if needed """ # For now, call synchronously (OpenHands is sync) # Could wrap in asyncio.to_thread() for true async # NOTE: This is simplified - actual OpenHands integration # would involve message passing, observation handling, etc. # We'll keep it simple for migration logger.info(f"📝 Task length: {len(task)} chars") # In ev2.py, the agent is run via Agent's API # We'll need to properly integrate this return {"status": "completed"} def _extract_insights(self) -> List[str]: """Extract insights from EVAL_AGENTS.md""" eval_agents_md = self.memory_dir / "EVAL_AGENTS.md" if not eval_agents_md.exists(): return [] insights = [] content = eval_agents_md.read_text() # Simple extraction - look for bullet points for line in content.split('\n'): if line.strip().startswith('*') or line.strip().startswith('-'): insights.append(line.strip()) return insights[-10:] # Last 10 insights def _extract_metrics(self) -> Dict[str, Any]: """Extract auxiliary metrics""" auxiliary_py = self.memory_dir / "auxiliary_metrics.py" if not auxiliary_py.exists(): return {} # Could dynamically import and execute # For now, just check existence return { "auxiliary_metrics_file_exists": True, "file_path": str(auxiliary_py) } ``` #### 1.2 集成到 Service ```python class EV2ServiceStandalone: """ Standalone EV2 Service (no dependency on ev2.py) """ def __init__(self, config: ServiceConfig): self.config = config self.state = ServiceState(config) # Create integrated agent (PERSISTENT) self.agent = IntegratedEV2Agent( results_dir=config.results_dir, primary_evaluator_path=config.primary_evaluator_path, config=config.__dict__ ) async def handle_generation_notification(self, request: GenerationCompleteRequest): """Handle generation notification""" # Decision logic (same as before) should_trigger, reason = self.state.should_trigger_agent(...) if should_trigger: # Call integrated agent (not ev2.py!) result = await self.agent.analyze_generation(request.generation) return result return {"status": "skipped"} ``` --- ### Phase 2: 完善 Agent 集成(中等优先级) **目标**:完整实现 OpenHands agent 的交互逻辑 #### 2.1 消息处理 从 `ev2.py` 迁移 agent 运行逻辑: ```python async def _run_agent(self, task: str) -> Dict[str, Any]: """ Run agent with proper message handling Migrated from ev2.py (simplified for now) """ # This is where ev2.py uses Agent API # We need to properly integrate: # 1. Send task as message # 2. Handle agent observations # 3. Collect agent actions # 4. Wait for completion # For MVP, we can use the same approach as ev2.py # but with the persistent agent instance pass # TODO: Implement based on OpenHands API ``` #### 2.2 工作空间管理 ```python def _setup_workspace(self): """Setup agent workspace""" # Ensure directories exist self.memory_dir.mkdir(parents=True, exist_ok=True) # Initialize EVAL_AGENTS.md if needed eval_md = self.memory_dir / "EVAL_AGENTS.md" if not eval_md.exists(): eval_md.write_text("""# Evaluation Agent Memory This document tracks insights and metrics across generations. """) ``` --- ### Phase 3: 状态管理增强(低优先级) **目标**:为未来的 MetricUnit 等功能做准备 #### 3.1 MetricRegistry(骨架) ```python class MetricRegistry: """ Registry for managing metrics Prepared for future MetricUnit integration """ def __init__(self, memory_dir: Path): self.memory_dir = memory_dir self.metrics = {} # id -> metadata def register_metric(self, metric_id: str, metadata: Dict[str, Any]): """Register a metric""" self.metrics[metric_id] = metadata def list_metrics(self) -> List[Dict[str, Any]]: """List all metrics""" return list(self.metrics.values()) ``` --- ## 📁 文件结构 ``` eval_agent/ ├── ev2_service_standalone.py # NEW: 完整的独立服务 ├── ev2_service.py # OLD: 保留作为参考 ├── ev2.py # OLD: 保留作为独立工具 ├── ev2_prompt.j2 # SHARED: 系统 prompt ├── ev2_service_config.yaml # SHARED: 配置文件 └── test_ev2_service.py # SHARED: 测试脚本 ``` **迁移后**: - `ev2_service_standalone.py`:生产使用 - `ev2.py`:保留作为独立命令行工具(可选) - `ev2_service.py`:删除或重命名为 `ev2_service_wrapper.py`(存档) --- ## 🚀 实施时间表 ### Day 1: 核心迁移(4-6 小时) **上午**: - [ ] 创建 `ev2_service_standalone.py` 基础结构 - [ ] 实现 `IntegratedEV2Agent.__init__` 和 `_create_agent` - [ ] 实现 `_build_task_message` **下午**: - [ ] 实现 `analyze_generation` 方法 - [ ] 集成到 FastAPI service - [ ] 修复 import 路径问题 **验收**:服务能启动,能接收通知,能调用 agent(即使简化版) --- ### Day 2: 完善和测试(4-6 小时) **上午**: - [ ] 完善 `_run_agent` 方法(如果需要) - [ ] 实现结果提取(`_extract_insights`, `_extract_metrics`) - [ ] 添加错误处理 **下午**: - [ ] 完整测试(使用 `test_ev2_service.py`) - [ ] 修复发现的问题 - [ ] 性能优化 **验收**:能完整运行一次演化模拟,agent 正确生成输出 --- ### Day 3: 清理和文档(2-4 小时) **上午**: - [ ] 代码清理和重构 - [ ] 添加详细注释 - [ ] 更新配置文件 **下午**: - [ ] 更新文档 - [ ] 创建使用示例 - [ ] 准备集成到 ShinkaEvolve **验收**:代码质量高,文档完整,ready for production --- ## 📋 迁移 Checklist ### 从 ev2.py 迁移的内容 - [ ] **Agent 创建逻辑** - [x] LLM 配置 - [x] Tools 配置 - [x] System prompt 加载 - [ ] Agent 初始化参数 - [ ] **Task 构建逻辑** - [x] Primary evaluator 路径处理 - [x] Generation 信息 - [ ] 额外的 context(如果需要) - [ ] **Agent 运行逻辑** - [ ] 消息发送 - [ ] 观察处理 - [ ] 结果等待 - [ ] **结果提取逻辑** - [x] EVAL_AGENTS.md 解析 - [x] auxiliary_metrics.py 检测 - [ ] 更复杂的结果解析(可选) - [ ] **工作空间管理** - [x] Memory 目录创建 - [ ] 初始文件创建 - [ ] 清理逻辑(可选) ### 新增功能 - [x] **HTTP API** - [x] Generation notification endpoint - [x] Status endpoint - [x] Manual trigger endpoint - [x] **状态管理** - [x] Generation history - [x] Trigger decision logic - [x] 持久化 - [ ] **Agent 持久化** - [ ] Agent instance 复用 - [ ] Conversation history 累积 - [ ] Memory 跨代数共享 ### 配置和部署 - [x] **配置文件** - [x] Service 配置 - [x] Trigger 策略配置 - [ ] Agent 参数配置 - [ ] **测试** - [x] 基础功能测试 - [ ] 集成测试 - [ ] 性能测试 - [ ] **文档** - [x] API 文档 - [ ] 迁移文档 - [ ] 使用指南 --- ## 🎯 迁移的关键挑战 ### Challenge 1: OpenHands Agent 交互 **问题**:`ev2.py` 使用 OpenHands 的特定 API,需要理解其工作方式 **解决方案**: - 先保持简化版本(调用 agent,等待完成) - 逐步完善(如果需要更精细的控制) - 参考 `ev2.py` 的实现 ### Challenge 2: Agent 状态持久化 **问题**:每次调用是否需要保持 agent 的上下文? **解决方案**: - **Short-term**:每次创建新 agent(像 ev2.py 一样) - **Long-term**:复用 agent instance,累积 conversation history ### Challenge 3: 错误处理 **问题**:Agent 可能失败,如何优雅处理? **解决方案**: - Try-catch 包装 agent 调用 - 记录详细错误日志 - 返回有意义的错误信息 - Service 继续运行(不崩溃) --- ## 💡 简化策略 为了快速完成迁移,建议采用 **渐进式策略**: ### MVP 版本(最小可行) **目标**:用最少的改动让服务工作 **简化点**: 1. **Agent 运行**:直接调用,不追求最优性能 2. **结果提取**:简单解析(像现在一样) 3. **状态管理**:基础版本即可 **时间**:1 天 ### 增强版本(生产就绪) **目标**:优化性能和用户体验 **增强点**: 1. **Agent 持久化**:复用 agent instance 2. **更好的结果解析**:提取更多信息 3. **错误恢复**:健壮的错误处理 **时间**:+1 天 ### 完整版本(未来扩展) **目标**:为高级功能做准备 **扩展点**: 1. **MetricUnit 集成** 2. **Lifecycle 管理** 3. **异步 Meta-cognition** **时间**:+1-2 周(按需) --- ## 📊 对比:迁移前 vs 迁移后 | 方面 | 迁移前 (wrapper) | 迁移后 (standalone) | |------|-----------------|---------------------| | **依赖** | 依赖 ev2.py | 完全独立 | | **架构** | 两层 | 单层 | | **状态** | 分散 | 集中 | | **Agent** | 每次创建 | 可持久化 | | **扩展性** | 受限 | 高 | | **维护性** | 中等 | 高 | | **性能** | 有开销 | 优化 | | **代码行数** | ~700 | ~800-1000 | --- ## ✅ 验收标准 迁移完成的标准: 1. **功能完整性** - [ ] 所有 ev2.py 的功能都保留 - [ ] HTTP API 正常工作 - [ ] 状态持久化正常 - [ ] Agent 能正确运行 2. **测试通过** - [ ] `test_ev2_service.py` 全部通过 - [ ] 模拟 25 代演化成功 - [ ] Agent 生成 EVAL_AGENTS.md 和 auxiliary_metrics.py 3. **代码质量** - [ ] 无 linter 错误 - [ ] 有充分的注释 - [ ] 结构清晰 4. **文档完整** - [ ] API 文档更新 - [ ] 使用指南更新 - [ ] 迁移说明清晰 --- ## 🚀 立即开始 ### 第一步(今天,30 分钟) 1. 创建 `ev2_service_standalone.py` 骨架 2. 复制 `ev2_service.py` 的 HTTP 部分 3. 创建 `IntegratedEV2Agent` 类骨架 ### 第二步(明天上午,2-3 小时) 1. 从 `ev2.py` 迁移核心逻辑到 `IntegratedEV2Agent` 2. 实现 `_create_agent` 和 `_build_task_message` 3. 简化版的 `analyze_generation` ### 第三步(明天下午,2-3 小时) 1. 完整测试 2. 修复问题 3. 文档更新 --- **Ready to start?** 我可以帮你创建 `ev2_service_standalone.py` 的骨架!🚀