EV2 Migration Plan: From Wrapper to Standalone
🎯 目标
将 ev2.py 的逻辑完全迁移到 ev2_service_standalone.py,创建一个独立的、完整的评估服务。
设计原则:
- ✅ 不依赖
ev2.py(完全独立) - ✅ 保留所有现有功能
- ✅ 为未来扩展做准备(MetricUnit、Lifecycle 等)
- ✅ 更清晰的架构和状态管理
📊 当前架构 vs 目标架构
当前架构(ev2_service.py)
ev2_service.py (HTTP wrapper)
↓ 调用
ev2.py (Agent logic)
↓ 使用
OpenHands Agent
问题:
- 两层抽象,状态分散
- 不利于深度集成
- ev2.py 的设计假设单次运行
目标架构(ev2_service_standalone.py)
ev2_service_standalone.py
├── FastAPI HTTP Server
├── ServiceState (持久化状态管理)
├── IntegratedEV2Agent (直接管理 OpenHands)
│ ├── Agent instance (持久化)
│ ├── Memory management
│ └── Conversation history
└── MetricRegistry (可选,为未来准备)
优势:
- 单一职责,逻辑集中
- Agent 持久化,无需每次重建
- 更好的状态管理
- 为 MetricUnit 等高级功能铺路
🔧 迁移步骤
Phase 1: 核心 Agent 类(优先级最高)
目标:创建 IntegratedEV2Agent 类,替代对 evolution_evaluation_agent() 的调用
1.1 创建 Agent 管理类
# ev2_service_standalone.py
from openhands.agent import Agent
from openhands.llm import LLM
from openhands.tools import TerminalTool, FileEditorTool, TaskTrackerTool
from openhands.tools.tool import Tool
class IntegratedEV2Agent:
"""
Integrated EV2 Agent (not a wrapper)
Directly manages OpenHands agent lifecycle and state
"""
def __init__(self,
results_dir: str,
primary_evaluator_path: str,
config: Dict[str, Any]):
self.results_dir = Path(results_dir).resolve()
self.primary_evaluator_path = Path(primary_evaluator_path).resolve()
self.config = config
# Memory directory (persistent)
self.memory_dir = self.results_dir / "eval_agent_memory"
self.memory_dir.mkdir(parents=True, exist_ok=True)
# Initialize OpenHands agent (persistent!)
self.agent = self._create_agent()
# Conversation history (accumulates across generations)
self.conversation_history = []
logger.info(f"✅ IntegratedEV2Agent initialized")
logger.info(f" Memory dir: {self.memory_dir}")
logger.info(f" Primary evaluator: {self.primary_evaluator_path}")
def _create_agent(self) -> Agent:
"""
Create OpenHands agent
Migrated from ev2.py:evolution_evaluation_agent()
"""
# LLM setup
llm = LLM(model="anthropic/claude-sonnet-4-20250514")
# System prompt
prompt_path = Path(__file__).parent / "ev2_prompt.j2"
if not prompt_path.exists():
raise FileNotFoundError(f"Prompt template not found: {prompt_path}")
# Create agent with tools
agent = Agent(
llm=llm,
tools=[
Tool(name=TerminalTool.name),
Tool(name=FileEditorTool.name),
Tool(name=TaskTrackerTool.name),
],
system_prompt_filename=str(prompt_path),
)
return agent
async def analyze_generation(self, generation: int) -> Dict[str, Any]:
"""
Analyze a generation
This is the main entry point, replacing evolution_evaluation_agent()
"""
logger.info(f"🧠 Analyzing generation {generation}...")
# Build task message
task = self._build_task_message(generation)
# Run agent
result = await self._run_agent(task)
# Extract results
insights = self._extract_insights()
metrics = self._extract_metrics()
return {
"success": True,
"insights": insights,
"auxiliary_metrics": metrics,
"generation": generation
}
def _build_task_message(self, generation: int) -> str:
"""
Build task message for agent
Migrated from ev2.py:_build_default_task()
"""
# Read primary evaluator code
primary_code = ""
if self.primary_evaluator_path.exists():
primary_code = self.primary_evaluator_path.read_text()
# Check for generation directory
gen_dir = self._find_generation_dir(generation)
task = f"""# Evolution Evaluation Task - Generation {generation}
## Your Mission
You are analyzing the evolution process for a code optimization task. Your workspace is:
`{self.memory_dir}`
## Current Generation
Generation: {generation}
Results directory: {gen_dir if gen_dir else 'Not found'}
## Primary Evaluator (Fixed, DO NOT MODIFY)
The ground truth evaluation is defined in:
`{self.primary_evaluator_path}`
**CRITICAL**: You MUST NOT modify this file. Read it to understand the primary objective.
## Your Tasks
1. **READ** the primary evaluator to understand the ground truth objective
2. **ANALYZE** the current generation's performance and strategy
3. **CREATE** auxiliary evaluation metrics that provide insights beyond the primary score
4. **UPDATE** EVAL_AGENTS.md with your findings and recommendations
## Workspace Structure
Your workspace (`{self.memory_dir}`) should contain:
- `EVAL_AGENTS.md`: Your accumulated insights and analysis
- `auxiliary_metrics.py`: Python code for auxiliary metrics
- Any other analysis files you create
## Constraints
- Primary metric is FIXED - you cannot change it
- Auxiliary metrics should complement, not replace, the primary metric
- Focus on actionable insights that can guide the evolution process
## Output
Update EVAL_AGENTS.md with:
- Analysis of generation {generation}
- Auxiliary metric definitions and values
- Insights and recommendations for future generations
Begin your analysis!
"""
return task
def _find_generation_dir(self, generation: int) -> Optional[Path]:
"""Find the generation directory"""
# Try common patterns
patterns = [
self.results_dir / f"gen_{generation}",
self.results_dir.parent / f"gen_{generation}",
]
for pattern in patterns:
if pattern.exists():
return pattern
return None
async def _run_agent(self, task: str) -> Dict[str, Any]:
"""
Run the agent with a task
This is where we'd integrate async execution if needed
"""
# For now, call synchronously (OpenHands is sync)
# Could wrap in asyncio.to_thread() for true async
# NOTE: This is simplified - actual OpenHands integration
# would involve message passing, observation handling, etc.
# We'll keep it simple for migration
logger.info(f"📝 Task length: {len(task)} chars")
# In ev2.py, the agent is run via Agent's API
# We'll need to properly integrate this
return {"status": "completed"}
def _extract_insights(self) -> List[str]:
"""Extract insights from EVAL_AGENTS.md"""
eval_agents_md = self.memory_dir / "EVAL_AGENTS.md"
if not eval_agents_md.exists():
return []
insights = []
content = eval_agents_md.read_text()
# Simple extraction - look for bullet points
for line in content.split('\n'):
if line.strip().startswith('*') or line.strip().startswith('-'):
insights.append(line.strip())
return insights[-10:] # Last 10 insights
def _extract_metrics(self) -> Dict[str, Any]:
"""Extract auxiliary metrics"""
auxiliary_py = self.memory_dir / "auxiliary_metrics.py"
if not auxiliary_py.exists():
return {}
# Could dynamically import and execute
# For now, just check existence
return {
"auxiliary_metrics_file_exists": True,
"file_path": str(auxiliary_py)
}
1.2 集成到 Service
class EV2ServiceStandalone:
"""
Standalone EV2 Service (no dependency on ev2.py)
"""
def __init__(self, config: ServiceConfig):
self.config = config
self.state = ServiceState(config)
# Create integrated agent (PERSISTENT)
self.agent = IntegratedEV2Agent(
results_dir=config.results_dir,
primary_evaluator_path=config.primary_evaluator_path,
config=config.__dict__
)
async def handle_generation_notification(self, request: GenerationCompleteRequest):
"""Handle generation notification"""
# Decision logic (same as before)
should_trigger, reason = self.state.should_trigger_agent(...)
if should_trigger:
# Call integrated agent (not ev2.py!)
result = await self.agent.analyze_generation(request.generation)
return result
return {"status": "skipped"}
Phase 2: 完善 Agent 集成(中等优先级)
目标:完整实现 OpenHands agent 的交互逻辑
2.1 消息处理
从 ev2.py 迁移 agent 运行逻辑:
async def _run_agent(self, task: str) -> Dict[str, Any]:
"""
Run agent with proper message handling
Migrated from ev2.py (simplified for now)
"""
# This is where ev2.py uses Agent API
# We need to properly integrate:
# 1. Send task as message
# 2. Handle agent observations
# 3. Collect agent actions
# 4. Wait for completion
# For MVP, we can use the same approach as ev2.py
# but with the persistent agent instance
pass # TODO: Implement based on OpenHands API
2.2 工作空间管理
def _setup_workspace(self):
"""Setup agent workspace"""
# Ensure directories exist
self.memory_dir.mkdir(parents=True, exist_ok=True)
# Initialize EVAL_AGENTS.md if needed
eval_md = self.memory_dir / "EVAL_AGENTS.md"
if not eval_md.exists():
eval_md.write_text("""# Evaluation Agent Memory
This document tracks insights and metrics across generations.
""")
Phase 3: 状态管理增强(低优先级)
目标:为未来的 MetricUnit 等功能做准备
3.1 MetricRegistry(骨架)
class MetricRegistry:
"""
Registry for managing metrics
Prepared for future MetricUnit integration
"""
def __init__(self, memory_dir: Path):
self.memory_dir = memory_dir
self.metrics = {} # id -> metadata
def register_metric(self, metric_id: str, metadata: Dict[str, Any]):
"""Register a metric"""
self.metrics[metric_id] = metadata
def list_metrics(self) -> List[Dict[str, Any]]:
"""List all metrics"""
return list(self.metrics.values())
📁 文件结构
eval_agent/
├── ev2_service_standalone.py # NEW: 完整的独立服务
├── ev2_service.py # OLD: 保留作为参考
├── ev2.py # OLD: 保留作为独立工具
├── ev2_prompt.j2 # SHARED: 系统 prompt
├── ev2_service_config.yaml # SHARED: 配置文件
└── test_ev2_service.py # SHARED: 测试脚本
迁移后:
ev2_service_standalone.py:生产使用ev2.py:保留作为独立命令行工具(可选)ev2_service.py:删除或重命名为ev2_service_wrapper.py(存档)
🚀 实施时间表
Day 1: 核心迁移(4-6 小时)
上午:
- 创建
ev2_service_standalone.py基础结构 - 实现
IntegratedEV2Agent.__init__和_create_agent - 实现
_build_task_message
下午:
- 实现
analyze_generation方法 - 集成到 FastAPI service
- 修复 import 路径问题
验收:服务能启动,能接收通知,能调用 agent(即使简化版)
Day 2: 完善和测试(4-6 小时)
上午:
- 完善
_run_agent方法(如果需要) - 实现结果提取(
_extract_insights,_extract_metrics) - 添加错误处理
下午:
- 完整测试(使用
test_ev2_service.py) - 修复发现的问题
- 性能优化
验收:能完整运行一次演化模拟,agent 正确生成输出
Day 3: 清理和文档(2-4 小时)
上午:
- 代码清理和重构
- 添加详细注释
- 更新配置文件
下午:
- 更新文档
- 创建使用示例
- 准备集成到 ShinkaEvolve
验收:代码质量高,文档完整,ready for production
📋 迁移 Checklist
从 ev2.py 迁移的内容
Agent 创建逻辑
- LLM 配置
- Tools 配置
- System prompt 加载
- Agent 初始化参数
Task 构建逻辑
- Primary evaluator 路径处理
- Generation 信息
- 额外的 context(如果需要)
Agent 运行逻辑
- 消息发送
- 观察处理
- 结果等待
结果提取逻辑
- EVAL_AGENTS.md 解析
- auxiliary_metrics.py 检测
- 更复杂的结果解析(可选)
工作空间管理
- Memory 目录创建
- 初始文件创建
- 清理逻辑(可选)
新增功能
HTTP API
- Generation notification endpoint
- Status endpoint
- Manual trigger endpoint
状态管理
- Generation history
- Trigger decision logic
- 持久化
Agent 持久化
- Agent instance 复用
- Conversation history 累积
- Memory 跨代数共享
配置和部署
配置文件
- Service 配置
- Trigger 策略配置
- Agent 参数配置
测试
- 基础功能测试
- 集成测试
- 性能测试
文档
- API 文档
- 迁移文档
- 使用指南
🎯 迁移的关键挑战
Challenge 1: OpenHands Agent 交互
问题:ev2.py 使用 OpenHands 的特定 API,需要理解其工作方式
解决方案:
- 先保持简化版本(调用 agent,等待完成)
- 逐步完善(如果需要更精细的控制)
- 参考
ev2.py的实现
Challenge 2: Agent 状态持久化
问题:每次调用是否需要保持 agent 的上下文?
解决方案:
- Short-term:每次创建新 agent(像 ev2.py 一样)
- Long-term:复用 agent instance,累积 conversation history
Challenge 3: 错误处理
问题:Agent 可能失败,如何优雅处理?
解决方案:
- Try-catch 包装 agent 调用
- 记录详细错误日志
- 返回有意义的错误信息
- Service 继续运行(不崩溃)
💡 简化策略
为了快速完成迁移,建议采用 渐进式策略:
MVP 版本(最小可行)
目标:用最少的改动让服务工作
简化点:
- Agent 运行:直接调用,不追求最优性能
- 结果提取:简单解析(像现在一样)
- 状态管理:基础版本即可
时间:1 天
增强版本(生产就绪)
目标:优化性能和用户体验
增强点:
- Agent 持久化:复用 agent instance
- 更好的结果解析:提取更多信息
- 错误恢复:健壮的错误处理
时间:+1 天
完整版本(未来扩展)
目标:为高级功能做准备
扩展点:
- MetricUnit 集成
- Lifecycle 管理
- 异步 Meta-cognition
时间:+1-2 周(按需)
📊 对比:迁移前 vs 迁移后
| 方面 | 迁移前 (wrapper) | 迁移后 (standalone) |
|---|---|---|
| 依赖 | 依赖 ev2.py | 完全独立 |
| 架构 | 两层 | 单层 |
| 状态 | 分散 | 集中 |
| Agent | 每次创建 | 可持久化 |
| 扩展性 | 受限 | 高 |
| 维护性 | 中等 | 高 |
| 性能 | 有开销 | 优化 |
| 代码行数 | ~700 | ~800-1000 |
✅ 验收标准
迁移完成的标准:
功能完整性
- 所有 ev2.py 的功能都保留
- HTTP API 正常工作
- 状态持久化正常
- Agent 能正确运行
测试通过
-
test_ev2_service.py全部通过 - 模拟 25 代演化成功
- Agent 生成 EVAL_AGENTS.md 和 auxiliary_metrics.py
-
代码质量
- 无 linter 错误
- 有充分的注释
- 结构清晰
文档完整
- API 文档更新
- 使用指南更新
- 迁移说明清晰
🚀 立即开始
第一步(今天,30 分钟)
- 创建
ev2_service_standalone.py骨架 - 复制
ev2_service.py的 HTTP 部分 - 创建
IntegratedEV2Agent类骨架
第二步(明天上午,2-3 小时)
- 从
ev2.py迁移核心逻辑到IntegratedEV2Agent - 实现
_create_agent和_build_task_message - 简化版的
analyze_generation
第三步(明天下午,2-3 小时)
- 完整测试
- 修复问题
- 文档更新
Ready to start? 我可以帮你创建 ev2_service_standalone.py 的骨架!🚀