# EV2 Migration Plan: From Wrapper to Standalone

## 🎯 目标

将 `ev2.py` 的逻辑完全迁移到 `ev2_service_standalone.py`，创建一个独立的、完整的评估服务。

**设计原则**：
1. ✅ 不依赖 `ev2.py`（完全独立）
2. ✅ 保留所有现有功能
3. ✅ 为未来扩展做准备（MetricUnit、Lifecycle 等）
4. ✅ 更清晰的架构和状态管理

---

## 📊 当前架构 vs 目标架构

### 当前架构（ev2_service.py）

```
ev2_service.py (HTTP wrapper)
    ↓ 调用
ev2.py (Agent logic)
    ↓ 使用
OpenHands Agent
```

**问题**：
- 两层抽象，状态分散
- 不利于深度集成
- ev2.py 的设计假设单次运行

### 目标架构（ev2_service_standalone.py）

```
ev2_service_standalone.py
├── FastAPI HTTP Server
├── ServiceState (持久化状态管理)
├── IntegratedEV2Agent (直接管理 OpenHands)
│   ├── Agent instance (持久化)
│   ├── Memory management
│   └── Conversation history
└── MetricRegistry (可选，为未来准备)
```

**优势**：
- 单一职责，逻辑集中
- Agent 持久化，无需每次重建
- 更好的状态管理
- 为 MetricUnit 等高级功能铺路

---

## 🔧 迁移步骤

### Phase 1: 核心 Agent 类（优先级最高）

**目标**：创建 `IntegratedEV2Agent` 类，替代对 `evolution_evaluation_agent()` 的调用

#### 1.1 创建 Agent 管理类

```python
# ev2_service_standalone.py

from openhands.agent import Agent
from openhands.llm import LLM
from openhands.tools import TerminalTool, FileEditorTool, TaskTrackerTool
from openhands.tools.tool import Tool

class IntegratedEV2Agent:
    """
    Integrated EV2 Agent (not a wrapper)
    
    Directly manages OpenHands agent lifecycle and state
    """
    
    def __init__(self, 
                 results_dir: str,
                 primary_evaluator_path: str,
                 config: Dict[str, Any]):
        
        self.results_dir = Path(results_dir).resolve()
        self.primary_evaluator_path = Path(primary_evaluator_path).resolve()
        self.config = config
        
        # Memory directory (persistent)
        self.memory_dir = self.results_dir / "eval_agent_memory"
        self.memory_dir.mkdir(parents=True, exist_ok=True)
        
        # Initialize OpenHands agent (persistent!)
        self.agent = self._create_agent()
        
        # Conversation history (accumulates across generations)
        self.conversation_history = []
        
        logger.info(f"✅ IntegratedEV2Agent initialized")
        logger.info(f"   Memory dir: {self.memory_dir}")
        logger.info(f"   Primary evaluator: {self.primary_evaluator_path}")
    
    def _create_agent(self) -> Agent:
        """
        Create OpenHands agent
        
        Migrated from ev2.py:evolution_evaluation_agent()
        """
        # LLM setup
        llm = LLM(model="anthropic/claude-sonnet-4-20250514")
        
        # System prompt
        prompt_path = Path(__file__).parent / "ev2_prompt.j2"
        if not prompt_path.exists():
            raise FileNotFoundError(f"Prompt template not found: {prompt_path}")
        
        # Create agent with tools
        agent = Agent(
            llm=llm,
            tools=[
                Tool(name=TerminalTool.name),
                Tool(name=FileEditorTool.name),
                Tool(name=TaskTrackerTool.name),
            ],
            system_prompt_filename=str(prompt_path),
        )
        
        return agent
    
    async def analyze_generation(self, generation: int) -> Dict[str, Any]:
        """
        Analyze a generation
        
        This is the main entry point, replacing evolution_evaluation_agent()
        """
        logger.info(f"🧠 Analyzing generation {generation}...")
        
        # Build task message
        task = self._build_task_message(generation)
        
        # Run agent
        result = await self._run_agent(task)
        
        # Extract results
        insights = self._extract_insights()
        metrics = self._extract_metrics()
        
        return {
            "success": True,
            "insights": insights,
            "auxiliary_metrics": metrics,
            "generation": generation
        }
    
    def _build_task_message(self, generation: int) -> str:
        """
        Build task message for agent
        
        Migrated from ev2.py:_build_default_task()
        """
        # Read primary evaluator code
        primary_code = ""
        if self.primary_evaluator_path.exists():
            primary_code = self.primary_evaluator_path.read_text()
        
        # Check for generation directory
        gen_dir = self._find_generation_dir(generation)
        
        task = f"""# Evolution Evaluation Task - Generation {generation}

## Your Mission

You are analyzing the evolution process for a code optimization task. Your workspace is:
`{self.memory_dir}`

## Current Generation

Generation: {generation}
Results directory: {gen_dir if gen_dir else 'Not found'}

## Primary Evaluator (Fixed, DO NOT MODIFY)

The ground truth evaluation is defined in:
`{self.primary_evaluator_path}`

**CRITICAL**: You MUST NOT modify this file. Read it to understand the primary objective.

## Your Tasks

1. **READ** the primary evaluator to understand the ground truth objective
2. **ANALYZE** the current generation's performance and strategy
3. **CREATE** auxiliary evaluation metrics that provide insights beyond the primary score
4. **UPDATE** EVAL_AGENTS.md with your findings and recommendations

## Workspace Structure

Your workspace (`{self.memory_dir}`) should contain:
- `EVAL_AGENTS.md`: Your accumulated insights and analysis
- `auxiliary_metrics.py`: Python code for auxiliary metrics
- Any other analysis files you create

## Constraints

- Primary metric is FIXED - you cannot change it
- Auxiliary metrics should complement, not replace, the primary metric
- Focus on actionable insights that can guide the evolution process

## Output

Update EVAL_AGENTS.md with:
- Analysis of generation {generation}
- Auxiliary metric definitions and values
- Insights and recommendations for future generations

Begin your analysis!
"""
        
        return task
    
    def _find_generation_dir(self, generation: int) -> Optional[Path]:
        """Find the generation directory"""
        # Try common patterns
        patterns = [
            self.results_dir / f"gen_{generation}",
            self.results_dir.parent / f"gen_{generation}",
        ]
        
        for pattern in patterns:
            if pattern.exists():
                return pattern
        
        return None
    
    async def _run_agent(self, task: str) -> Dict[str, Any]:
        """
        Run the agent with a task
        
        This is where we'd integrate async execution if needed
        """
        # For now, call synchronously (OpenHands is sync)
        # Could wrap in asyncio.to_thread() for true async
        
        # NOTE: This is simplified - actual OpenHands integration
        # would involve message passing, observation handling, etc.
        # We'll keep it simple for migration
        
        logger.info(f"📝 Task length: {len(task)} chars")
        
        # In ev2.py, the agent is run via Agent's API
        # We'll need to properly integrate this
        
        return {"status": "completed"}
    
    def _extract_insights(self) -> List[str]:
        """Extract insights from EVAL_AGENTS.md"""
        eval_agents_md = self.memory_dir / "EVAL_AGENTS.md"
        
        if not eval_agents_md.exists():
            return []
        
        insights = []
        content = eval_agents_md.read_text()
        
        # Simple extraction - look for bullet points
        for line in content.split('\n'):
            if line.strip().startswith('*') or line.strip().startswith('-'):
                insights.append(line.strip())
        
        return insights[-10:]  # Last 10 insights
    
    def _extract_metrics(self) -> Dict[str, Any]:
        """Extract auxiliary metrics"""
        auxiliary_py = self.memory_dir / "auxiliary_metrics.py"
        
        if not auxiliary_py.exists():
            return {}
        
        # Could dynamically import and execute
        # For now, just check existence
        return {
            "auxiliary_metrics_file_exists": True,
            "file_path": str(auxiliary_py)
        }
```

#### 1.2 集成到 Service

```python
class EV2ServiceStandalone:
    """
    Standalone EV2 Service (no dependency on ev2.py)
    """
    
    def __init__(self, config: ServiceConfig):
        self.config = config
        self.state = ServiceState(config)
        
        # Create integrated agent (PERSISTENT)
        self.agent = IntegratedEV2Agent(
            results_dir=config.results_dir,
            primary_evaluator_path=config.primary_evaluator_path,
            config=config.__dict__
        )
    
    async def handle_generation_notification(self, request: GenerationCompleteRequest):
        """Handle generation notification"""
        # Decision logic (same as before)
        should_trigger, reason = self.state.should_trigger_agent(...)
        
        if should_trigger:
            # Call integrated agent (not ev2.py!)
            result = await self.agent.analyze_generation(request.generation)
            return result
        
        return {"status": "skipped"}
```

---

### Phase 2: 完善 Agent 集成（中等优先级）

**目标**：完整实现 OpenHands agent 的交互逻辑

#### 2.1 消息处理

从 `ev2.py` 迁移 agent 运行逻辑：

```python
async def _run_agent(self, task: str) -> Dict[str, Any]:
    """
    Run agent with proper message handling
    
    Migrated from ev2.py (simplified for now)
    """
    # This is where ev2.py uses Agent API
    # We need to properly integrate:
    # 1. Send task as message
    # 2. Handle agent observations
    # 3. Collect agent actions
    # 4. Wait for completion
    
    # For MVP, we can use the same approach as ev2.py
    # but with the persistent agent instance
    
    pass  # TODO: Implement based on OpenHands API
```

#### 2.2 工作空间管理

```python
def _setup_workspace(self):
    """Setup agent workspace"""
    # Ensure directories exist
    self.memory_dir.mkdir(parents=True, exist_ok=True)
    
    # Initialize EVAL_AGENTS.md if needed
    eval_md = self.memory_dir / "EVAL_AGENTS.md"
    if not eval_md.exists():
        eval_md.write_text("""# Evaluation Agent Memory

This document tracks insights and metrics across generations.
""")
```

---

### Phase 3: 状态管理增强（低优先级）

**目标**：为未来的 MetricUnit 等功能做准备

#### 3.1 MetricRegistry（骨架）

```python
class MetricRegistry:
    """
    Registry for managing metrics
    
    Prepared for future MetricUnit integration
    """
    
    def __init__(self, memory_dir: Path):
        self.memory_dir = memory_dir
        self.metrics = {}  # id -> metadata
    
    def register_metric(self, metric_id: str, metadata: Dict[str, Any]):
        """Register a metric"""
        self.metrics[metric_id] = metadata
    
    def list_metrics(self) -> List[Dict[str, Any]]:
        """List all metrics"""
        return list(self.metrics.values())
```

---

## 📁 文件结构

```
eval_agent/
├── ev2_service_standalone.py    # NEW: 完整的独立服务
├── ev2_service.py               # OLD: 保留作为参考
├── ev2.py                       # OLD: 保留作为独立工具
├── ev2_prompt.j2                # SHARED: 系统 prompt
├── ev2_service_config.yaml      # SHARED: 配置文件
└── test_ev2_service.py          # SHARED: 测试脚本
```

**迁移后**：
- `ev2_service_standalone.py`：生产使用
- `ev2.py`：保留作为独立命令行工具（可选）
- `ev2_service.py`：删除或重命名为 `ev2_service_wrapper.py`（存档）

---

## 🚀 实施时间表

### Day 1: 核心迁移（4-6 小时）

**上午**：
- [ ] 创建 `ev2_service_standalone.py` 基础结构
- [ ] 实现 `IntegratedEV2Agent.__init__` 和 `_create_agent`
- [ ] 实现 `_build_task_message`

**下午**：
- [ ] 实现 `analyze_generation` 方法
- [ ] 集成到 FastAPI service
- [ ] 修复 import 路径问题

**验收**：服务能启动，能接收通知，能调用 agent（即使简化版）

---

### Day 2: 完善和测试（4-6 小时）

**上午**：
- [ ] 完善 `_run_agent` 方法（如果需要）
- [ ] 实现结果提取（`_extract_insights`, `_extract_metrics`）
- [ ] 添加错误处理

**下午**：
- [ ] 完整测试（使用 `test_ev2_service.py`）
- [ ] 修复发现的问题
- [ ] 性能优化

**验收**：能完整运行一次演化模拟，agent 正确生成输出

---

### Day 3: 清理和文档（2-4 小时）

**上午**：
- [ ] 代码清理和重构
- [ ] 添加详细注释
- [ ] 更新配置文件

**下午**：
- [ ] 更新文档
- [ ] 创建使用示例
- [ ] 准备集成到 ShinkaEvolve

**验收**：代码质量高，文档完整，ready for production

---

## 📋 迁移 Checklist

### 从 ev2.py 迁移的内容

- [ ] **Agent 创建逻辑**
  - [x] LLM 配置
  - [x] Tools 配置
  - [x] System prompt 加载
  - [ ] Agent 初始化参数

- [ ] **Task 构建逻辑**
  - [x] Primary evaluator 路径处理
  - [x] Generation 信息
  - [ ] 额外的 context（如果需要）

- [ ] **Agent 运行逻辑**
  - [ ] 消息发送
  - [ ] 观察处理
  - [ ] 结果等待

- [ ] **结果提取逻辑**
  - [x] EVAL_AGENTS.md 解析
  - [x] auxiliary_metrics.py 检测
  - [ ] 更复杂的结果解析（可选）

- [ ] **工作空间管理**
  - [x] Memory 目录创建
  - [ ] 初始文件创建
  - [ ] 清理逻辑（可选）

### 新增功能

- [x] **HTTP API**
  - [x] Generation notification endpoint
  - [x] Status endpoint
  - [x] Manual trigger endpoint

- [x] **状态管理**
  - [x] Generation history
  - [x] Trigger decision logic
  - [x] 持久化

- [ ] **Agent 持久化**
  - [ ] Agent instance 复用
  - [ ] Conversation history 累积
  - [ ] Memory 跨代数共享

### 配置和部署

- [x] **配置文件**
  - [x] Service 配置
  - [x] Trigger 策略配置
  - [ ] Agent 参数配置

- [ ] **测试**
  - [x] 基础功能测试
  - [ ] 集成测试
  - [ ] 性能测试

- [ ] **文档**
  - [x] API 文档
  - [ ] 迁移文档
  - [ ] 使用指南

---

## 🎯 迁移的关键挑战

### Challenge 1: OpenHands Agent 交互

**问题**：`ev2.py` 使用 OpenHands 的特定 API，需要理解其工作方式

**解决方案**：
- 先保持简化版本（调用 agent，等待完成）
- 逐步完善（如果需要更精细的控制）
- 参考 `ev2.py` 的实现

### Challenge 2: Agent 状态持久化

**问题**：每次调用是否需要保持 agent 的上下文？

**解决方案**：
- **Short-term**：每次创建新 agent（像 ev2.py 一样）
- **Long-term**：复用 agent instance，累积 conversation history

### Challenge 3: 错误处理

**问题**：Agent 可能失败，如何优雅处理？

**解决方案**：
- Try-catch 包装 agent 调用
- 记录详细错误日志
- 返回有意义的错误信息
- Service 继续运行（不崩溃）

---

## 💡 简化策略

为了快速完成迁移，建议采用 **渐进式策略**：

### MVP 版本（最小可行）

**目标**：用最少的改动让服务工作

**简化点**：
1. **Agent 运行**：直接调用，不追求最优性能
2. **结果提取**：简单解析（像现在一样）
3. **状态管理**：基础版本即可

**时间**：1 天

### 增强版本（生产就绪）

**目标**：优化性能和用户体验

**增强点**：
1. **Agent 持久化**：复用 agent instance
2. **更好的结果解析**：提取更多信息
3. **错误恢复**：健壮的错误处理

**时间**：+1 天

### 完整版本（未来扩展）

**目标**：为高级功能做准备

**扩展点**：
1. **MetricUnit 集成**
2. **Lifecycle 管理**
3. **异步 Meta-cognition**

**时间**：+1-2 周（按需）

---

## 📊 对比：迁移前 vs 迁移后

| 方面 | 迁移前 (wrapper) | 迁移后 (standalone) |
|------|-----------------|---------------------|
| **依赖** | 依赖 ev2.py | 完全独立 |
| **架构** | 两层 | 单层 |
| **状态** | 分散 | 集中 |
| **Agent** | 每次创建 | 可持久化 |
| **扩展性** | 受限 | 高 |
| **维护性** | 中等 | 高 |
| **性能** | 有开销 | 优化 |
| **代码行数** | ~700 | ~800-1000 |

---

## ✅ 验收标准

迁移完成的标准：

1. **功能完整性**
   - [ ] 所有 ev2.py 的功能都保留
   - [ ] HTTP API 正常工作
   - [ ] 状态持久化正常
   - [ ] Agent 能正确运行

2. **测试通过**
   - [ ] `test_ev2_service.py` 全部通过
   - [ ] 模拟 25 代演化成功
   - [ ] Agent 生成 EVAL_AGENTS.md 和 auxiliary_metrics.py

3. **代码质量**
   - [ ] 无 linter 错误
   - [ ] 有充分的注释
   - [ ] 结构清晰

4. **文档完整**
   - [ ] API 文档更新
   - [ ] 使用指南更新
   - [ ] 迁移说明清晰

---

## 🚀 立即开始

### 第一步（今天，30 分钟）

1. 创建 `ev2_service_standalone.py` 骨架
2. 复制 `ev2_service.py` 的 HTTP 部分
3. 创建 `IntegratedEV2Agent` 类骨架

### 第二步（明天上午，2-3 小时）

1. 从 `ev2.py` 迁移核心逻辑到 `IntegratedEV2Agent`
2. 实现 `_create_agent` 和 `_build_task_message`
3. 简化版的 `analyze_generation`

### 第三步（明天下午，2-3 小时）

1. 完整测试
2. 修复问题
3. 文档更新

---

**Ready to start?** 我可以帮你创建 `ev2_service_standalone.py` 的骨架！🚀