JustinTX

Add files using upload-large-folder tool

3f6526a verified about 2 months ago

preview code

raw

history blame contribute delete

17.7 kB

EV2 Migration Plan: From Wrapper to Standalone

🎯 目标

将 ev2.py 的逻辑完全迁移到 ev2_service_standalone.py，创建一个独立的、完整的评估服务。

设计原则：

✅ 不依赖 ev2.py（完全独立）
✅ 保留所有现有功能
✅ 为未来扩展做准备（MetricUnit、Lifecycle 等）
✅ 更清晰的架构和状态管理

📊 当前架构 vs 目标架构

当前架构（ev2_service.py）

ev2_service.py (HTTP wrapper)
    ↓ 调用
ev2.py (Agent logic)
    ↓ 使用
OpenHands Agent

问题：

两层抽象，状态分散
不利于深度集成
ev2.py 的设计假设单次运行

目标架构（ev2_service_standalone.py）

ev2_service_standalone.py
├── FastAPI HTTP Server
├── ServiceState (持久化状态管理)
├── IntegratedEV2Agent (直接管理 OpenHands)
│   ├── Agent instance (持久化)
│   ├── Memory management
│   └── Conversation history
└── MetricRegistry (可选，为未来准备)

优势：

单一职责，逻辑集中
Agent 持久化，无需每次重建
更好的状态管理
为 MetricUnit 等高级功能铺路

🔧 迁移步骤

Phase 1: 核心 Agent 类（优先级最高）

目标：创建 IntegratedEV2Agent 类，替代对 evolution_evaluation_agent() 的调用

1.1 创建 Agent 管理类

# ev2_service_standalone.py

from openhands.agent import Agent
from openhands.llm import LLM
from openhands.tools import TerminalTool, FileEditorTool, TaskTrackerTool
from openhands.tools.tool import Tool

class IntegratedEV2Agent:
    """
    Integrated EV2 Agent (not a wrapper)
    
    Directly manages OpenHands agent lifecycle and state
    """
    
    def __init__(self, 
                 results_dir: str,
                 primary_evaluator_path: str,
                 config: Dict[str, Any]):
        
        self.results_dir = Path(results_dir).resolve()
        self.primary_evaluator_path = Path(primary_evaluator_path).resolve()
        self.config = config
        
        # Memory directory (persistent)
        self.memory_dir = self.results_dir / "eval_agent_memory"
        self.memory_dir.mkdir(parents=True, exist_ok=True)
        
        # Initialize OpenHands agent (persistent!)
        self.agent = self._create_agent()
        
        # Conversation history (accumulates across generations)
        self.conversation_history = []
        
        logger.info(f"✅ IntegratedEV2Agent initialized")
        logger.info(f"   Memory dir: {self.memory_dir}")
        logger.info(f"   Primary evaluator: {self.primary_evaluator_path}")
    
    def _create_agent(self) -> Agent:
        """
        Create OpenHands agent
        
        Migrated from ev2.py:evolution_evaluation_agent()
        """
        # LLM setup
        llm = LLM(model="anthropic/claude-sonnet-4-20250514")
        
        # System prompt
        prompt_path = Path(__file__).parent / "ev2_prompt.j2"
        if not prompt_path.exists():
            raise FileNotFoundError(f"Prompt template not found: {prompt_path}")
        
        # Create agent with tools
        agent = Agent(
            llm=llm,
            tools=[
                Tool(name=TerminalTool.name),
                Tool(name=FileEditorTool.name),
                Tool(name=TaskTrackerTool.name),
            ],
            system_prompt_filename=str(prompt_path),
        )
        
        return agent
    
    async def analyze_generation(self, generation: int) -> Dict[str, Any]:
        """
        Analyze a generation
        
        This is the main entry point, replacing evolution_evaluation_agent()
        """
        logger.info(f"🧠 Analyzing generation {generation}...")
        
        # Build task message
        task = self._build_task_message(generation)
        
        # Run agent
        result = await self._run_agent(task)
        
        # Extract results
        insights = self._extract_insights()
        metrics = self._extract_metrics()
        
        return {
            "success": True,
            "insights": insights,
            "auxiliary_metrics": metrics,
            "generation": generation
        }
    
    def _build_task_message(self, generation: int) -> str:
        """
        Build task message for agent
        
        Migrated from ev2.py:_build_default_task()
        """
        # Read primary evaluator code
        primary_code = ""
        if self.primary_evaluator_path.exists():
            primary_code = self.primary_evaluator_path.read_text()
        
        # Check for generation directory
        gen_dir = self._find_generation_dir(generation)
        
        task = f"""# Evolution Evaluation Task - Generation {generation}

## Your Mission

You are analyzing the evolution process for a code optimization task. Your workspace is:
`{self.memory_dir}`

## Current Generation

Generation: {generation}
Results directory: {gen_dir if gen_dir else 'Not found'}

## Primary Evaluator (Fixed, DO NOT MODIFY)

The ground truth evaluation is defined in:
`{self.primary_evaluator_path}`

**CRITICAL**: You MUST NOT modify this file. Read it to understand the primary objective.

## Your Tasks

1. **READ** the primary evaluator to understand the ground truth objective
2. **ANALYZE** the current generation's performance and strategy
3. **CREATE** auxiliary evaluation metrics that provide insights beyond the primary score
4. **UPDATE** EVAL_AGENTS.md with your findings and recommendations

## Workspace Structure

Your workspace (`{self.memory_dir}`) should contain:
- `EVAL_AGENTS.md`: Your accumulated insights and analysis
- `auxiliary_metrics.py`: Python code for auxiliary metrics
- Any other analysis files you create

## Constraints

- Primary metric is FIXED - you cannot change it
- Auxiliary metrics should complement, not replace, the primary metric
- Focus on actionable insights that can guide the evolution process

## Output

Update EVAL_AGENTS.md with:
- Analysis of generation {generation}
- Auxiliary metric definitions and values
- Insights and recommendations for future generations

Begin your analysis!
"""
        
        return task
    
    def _find_generation_dir(self, generation: int) -> Optional[Path]:
        """Find the generation directory"""
        # Try common patterns
        patterns = [
            self.results_dir / f"gen_{generation}",
            self.results_dir.parent / f"gen_{generation}",
        ]
        
        for pattern in patterns:
            if pattern.exists():
                return pattern
        
        return None
    
    async def _run_agent(self, task: str) -> Dict[str, Any]:
        """
        Run the agent with a task
        
        This is where we'd integrate async execution if needed
        """
        # For now, call synchronously (OpenHands is sync)
        # Could wrap in asyncio.to_thread() for true async
        
        # NOTE: This is simplified - actual OpenHands integration
        # would involve message passing, observation handling, etc.
        # We'll keep it simple for migration
        
        logger.info(f"📝 Task length: {len(task)} chars")
        
        # In ev2.py, the agent is run via Agent's API
        # We'll need to properly integrate this
        
        return {"status": "completed"}
    
    def _extract_insights(self) -> List[str]:
        """Extract insights from EVAL_AGENTS.md"""
        eval_agents_md = self.memory_dir / "EVAL_AGENTS.md"
        
        if not eval_agents_md.exists():
            return []
        
        insights = []
        content = eval_agents_md.read_text()
        
        # Simple extraction - look for bullet points
        for line in content.split('\n'):
            if line.strip().startswith('*') or line.strip().startswith('-'):
                insights.append(line.strip())
        
        return insights[-10:]  # Last 10 insights
    
    def _extract_metrics(self) -> Dict[str, Any]:
        """Extract auxiliary metrics"""
        auxiliary_py = self.memory_dir / "auxiliary_metrics.py"
        
        if not auxiliary_py.exists():
            return {}
        
        # Could dynamically import and execute
        # For now, just check existence
        return {
            "auxiliary_metrics_file_exists": True,
            "file_path": str(auxiliary_py)
        }

1.2 集成到 Service

class EV2ServiceStandalone:
    """
    Standalone EV2 Service (no dependency on ev2.py)
    """
    
    def __init__(self, config: ServiceConfig):
        self.config = config
        self.state = ServiceState(config)
        
        # Create integrated agent (PERSISTENT)
        self.agent = IntegratedEV2Agent(
            results_dir=config.results_dir,
            primary_evaluator_path=config.primary_evaluator_path,
            config=config.__dict__
        )
    
    async def handle_generation_notification(self, request: GenerationCompleteRequest):
        """Handle generation notification"""
        # Decision logic (same as before)
        should_trigger, reason = self.state.should_trigger_agent(...)
        
        if should_trigger:
            # Call integrated agent (not ev2.py!)
            result = await self.agent.analyze_generation(request.generation)
            return result
        
        return {"status": "skipped"}

Phase 2: 完善 Agent 集成（中等优先级）

目标：完整实现 OpenHands agent 的交互逻辑

2.1 消息处理

从 ev2.py 迁移 agent 运行逻辑：

async def _run_agent(self, task: str) -> Dict[str, Any]:
    """
    Run agent with proper message handling
    
    Migrated from ev2.py (simplified for now)
    """
    # This is where ev2.py uses Agent API
    # We need to properly integrate:
    # 1. Send task as message
    # 2. Handle agent observations
    # 3. Collect agent actions
    # 4. Wait for completion
    
    # For MVP, we can use the same approach as ev2.py
    # but with the persistent agent instance
    
    pass  # TODO: Implement based on OpenHands API

2.2 工作空间管理

def _setup_workspace(self):
    """Setup agent workspace"""
    # Ensure directories exist
    self.memory_dir.mkdir(parents=True, exist_ok=True)
    
    # Initialize EVAL_AGENTS.md if needed
    eval_md = self.memory_dir / "EVAL_AGENTS.md"
    if not eval_md.exists():
        eval_md.write_text("""# Evaluation Agent Memory

This document tracks insights and metrics across generations.
""")

Phase 3: 状态管理增强（低优先级）

目标：为未来的 MetricUnit 等功能做准备

3.1 MetricRegistry（骨架）

class MetricRegistry:
    """
    Registry for managing metrics
    
    Prepared for future MetricUnit integration
    """
    
    def __init__(self, memory_dir: Path):
        self.memory_dir = memory_dir
        self.metrics = {}  # id -> metadata
    
    def register_metric(self, metric_id: str, metadata: Dict[str, Any]):
        """Register a metric"""
        self.metrics[metric_id] = metadata
    
    def list_metrics(self) -> List[Dict[str, Any]]:
        """List all metrics"""
        return list(self.metrics.values())

📁 文件结构

eval_agent/
├── ev2_service_standalone.py    # NEW: 完整的独立服务
├── ev2_service.py               # OLD: 保留作为参考
├── ev2.py                       # OLD: 保留作为独立工具
├── ev2_prompt.j2                # SHARED: 系统 prompt
├── ev2_service_config.yaml      # SHARED: 配置文件
└── test_ev2_service.py          # SHARED: 测试脚本

迁移后：

ev2_service_standalone.py：生产使用
ev2.py：保留作为独立命令行工具（可选）
ev2_service.py：删除或重命名为 ev2_service_wrapper.py（存档）

🚀 实施时间表

Day 1: 核心迁移（4-6 小时）

上午：

创建 ev2_service_standalone.py 基础结构
实现 IntegratedEV2Agent.__init__ 和 _create_agent
实现 _build_task_message

下午：

实现 analyze_generation 方法
集成到 FastAPI service
修复 import 路径问题

验收：服务能启动，能接收通知，能调用 agent（即使简化版）

Day 2: 完善和测试（4-6 小时）

上午：

完善 _run_agent 方法（如果需要）
实现结果提取（_extract_insights, _extract_metrics）
添加错误处理

下午：

完整测试（使用 test_ev2_service.py）
修复发现的问题
性能优化

验收：能完整运行一次演化模拟，agent 正确生成输出

Day 3: 清理和文档（2-4 小时）

上午：

代码清理和重构
添加详细注释
更新配置文件

下午：

更新文档
创建使用示例
准备集成到 ShinkaEvolve

验收：代码质量高，文档完整，ready for production

📋 迁移 Checklist

从 ev2.py 迁移的内容

Agent 创建逻辑
- LLM 配置
- Tools 配置
- System prompt 加载
- Agent 初始化参数
Task 构建逻辑
- Primary evaluator 路径处理
- Generation 信息
- 额外的 context（如果需要）
Agent 运行逻辑
- 消息发送
- 观察处理
- 结果等待
结果提取逻辑
- EVAL_AGENTS.md 解析
- auxiliary_metrics.py 检测
- 更复杂的结果解析（可选）
工作空间管理
- Memory 目录创建
- 初始文件创建
- 清理逻辑（可选）

新增功能

HTTP API
- Generation notification endpoint
- Status endpoint
- Manual trigger endpoint
状态管理
- Generation history
- Trigger decision logic
- 持久化
Agent 持久化
- Agent instance 复用
- Conversation history 累积
- Memory 跨代数共享

配置和部署

配置文件
- Service 配置
- Trigger 策略配置
- Agent 参数配置
测试
- 基础功能测试
- 集成测试
- 性能测试
文档
- API 文档
- 迁移文档
- 使用指南

🎯 迁移的关键挑战

Challenge 1: OpenHands Agent 交互

问题：ev2.py 使用 OpenHands 的特定 API，需要理解其工作方式

解决方案：

先保持简化版本（调用 agent，等待完成）
逐步完善（如果需要更精细的控制）
参考 ev2.py 的实现

Challenge 2: Agent 状态持久化

问题：每次调用是否需要保持 agent 的上下文？

解决方案：

Short-term：每次创建新 agent（像 ev2.py 一样）
Long-term：复用 agent instance，累积 conversation history

Challenge 3: 错误处理

问题：Agent 可能失败，如何优雅处理？

解决方案：

Try-catch 包装 agent 调用
记录详细错误日志
返回有意义的错误信息
Service 继续运行（不崩溃）

💡 简化策略

为了快速完成迁移，建议采用 渐进式策略：

MVP 版本（最小可行）

目标：用最少的改动让服务工作

简化点：

Agent 运行：直接调用，不追求最优性能
结果提取：简单解析（像现在一样）
状态管理：基础版本即可

时间：1 天

增强版本（生产就绪）

目标：优化性能和用户体验

增强点：

Agent 持久化：复用 agent instance
更好的结果解析：提取更多信息
错误恢复：健壮的错误处理

时间：+1 天

完整版本（未来扩展）

目标：为高级功能做准备

扩展点：

MetricUnit 集成
Lifecycle 管理
异步 Meta-cognition

时间：+1-2 周（按需）

📊 对比：迁移前 vs 迁移后

方面	迁移前 (wrapper)	迁移后 (standalone)
依赖	依赖 ev2.py	完全独立
架构	两层	单层
状态	分散	集中
Agent	每次创建	可持久化
扩展性	受限	高
维护性	中等	高
性能	有开销	优化
代码行数	~700	~800-1000

✅ 验收标准

迁移完成的标准：

功能完整性
- 所有 ev2.py 的功能都保留
- HTTP API 正常工作
- 状态持久化正常
- Agent 能正确运行
测试通过
- test_ev2_service.py 全部通过
- 模拟 25 代演化成功
- Agent 生成 EVAL_AGENTS.md 和 auxiliary_metrics.py
代码质量
- 无 linter 错误
- 有充分的注释
- 结构清晰
文档完整
- API 文档更新
- 使用指南更新
- 迁移说明清晰

🚀 立即开始

第一步（今天，30 分钟）

创建 ev2_service_standalone.py 骨架
复制 ev2_service.py 的 HTTP 部分
创建 IntegratedEV2Agent 类骨架

第二步（明天上午，2-3 小时）

从 ev2.py 迁移核心逻辑到 IntegratedEV2Agent
实现 _create_agent 和 _build_task_message
简化版的 analyze_generation

第三步（明天下午，2-3 小时）

完整测试
修复问题
文档更新

Ready to start? 我可以帮你创建 ev2_service_standalone.py 的骨架！🚀