Add files using upload-large-folder tool

3f6526a verified about 2 months ago

17.7 kB

	# EV2 Migration Plan: From Wrapper to Standalone

	## 🎯 目标

	将 `ev2.py` 的逻辑完全迁移到 `ev2_service_standalone.py`，创建一个独立的、完整的评估服务。

	设计原则：
	1. ✅ 不依赖 `ev2.py`（完全独立）
	2. ✅ 保留所有现有功能
	3. ✅ 为未来扩展做准备（MetricUnit、Lifecycle 等）
	4. ✅ 更清晰的架构和状态管理

	---

	## 📊 当前架构 vs 目标架构

	### 当前架构（ev2_service.py）

	```
	ev2_service.py (HTTP wrapper)
	↓ 调用
	ev2.py (Agent logic)
	↓ 使用
	OpenHands Agent
	```

	问题：
	- 两层抽象，状态分散
	- 不利于深度集成
	- ev2.py 的设计假设单次运行

	### 目标架构（ev2_service_standalone.py）

	```
	ev2_service_standalone.py
	├── FastAPI HTTP Server
	├── ServiceState (持久化状态管理)
	├── IntegratedEV2Agent (直接管理 OpenHands)
	│ ├── Agent instance (持久化)
	│ ├── Memory management
	│ └── Conversation history
	└── MetricRegistry (可选，为未来准备)
	```

	优势：
	- 单一职责，逻辑集中
	- Agent 持久化，无需每次重建
	- 更好的状态管理
	- 为 MetricUnit 等高级功能铺路

	---

	## 🔧 迁移步骤

	### Phase 1: 核心 Agent 类（优先级最高）

	目标：创建 `IntegratedEV2Agent` 类，替代对 `evolution_evaluation_agent()` 的调用

	#### 1.1 创建 Agent 管理类

	```python
	# ev2_service_standalone.py

	from openhands.agent import Agent
	from openhands.llm import LLM
	from openhands.tools import TerminalTool, FileEditorTool, TaskTrackerTool
	from openhands.tools.tool import Tool

	class IntegratedEV2Agent:
	"""
	Integrated EV2 Agent (not a wrapper)

	Directly manages OpenHands agent lifecycle and state
	"""

	def __init__(self,
	results_dir: str,
	primary_evaluator_path: str,
	config: Dict[str, Any]):

	self.results_dir = Path(results_dir).resolve()
	self.primary_evaluator_path = Path(primary_evaluator_path).resolve()
	self.config = config

	# Memory directory (persistent)
	self.memory_dir = self.results_dir / "eval_agent_memory"
	self.memory_dir.mkdir(parents=True, exist_ok=True)

	# Initialize OpenHands agent (persistent!)
	self.agent = self._create_agent()

	# Conversation history (accumulates across generations)
	self.conversation_history = []

	logger.info(f"✅ IntegratedEV2Agent initialized")
	logger.info(f" Memory dir: {self.memory_dir}")
	logger.info(f" Primary evaluator: {self.primary_evaluator_path}")

	def _create_agent(self) -> Agent:
	"""
	Create OpenHands agent

	Migrated from ev2.py:evolution_evaluation_agent()
	"""
	# LLM setup
	llm = LLM(model="anthropic/claude-sonnet-4-20250514")

	# System prompt
	prompt_path = Path(__file__).parent / "ev2_prompt.j2"
	if not prompt_path.exists():
	raise FileNotFoundError(f"Prompt template not found: {prompt_path}")

	# Create agent with tools
	agent = Agent(
	llm=llm,
	tools=[
	Tool(name=TerminalTool.name),
	Tool(name=FileEditorTool.name),
	Tool(name=TaskTrackerTool.name),
	],
	system_prompt_filename=str(prompt_path),
	)

	return agent

	async def analyze_generation(self, generation: int) -> Dict[str, Any]:
	"""
	Analyze a generation

	This is the main entry point, replacing evolution_evaluation_agent()
	"""
	logger.info(f"🧠 Analyzing generation {generation}...")

	# Build task message
	task = self._build_task_message(generation)

	# Run agent
	result = await self._run_agent(task)

	# Extract results
	insights = self._extract_insights()
	metrics = self._extract_metrics()

	return {
	"success": True,
	"insights": insights,
	"auxiliary_metrics": metrics,
	"generation": generation
	}

	def _build_task_message(self, generation: int) -> str:
	"""
	Build task message for agent

	Migrated from ev2.py:_build_default_task()
	"""
	# Read primary evaluator code
	primary_code = ""
	if self.primary_evaluator_path.exists():
	primary_code = self.primary_evaluator_path.read_text()

	# Check for generation directory
	gen_dir = self._find_generation_dir(generation)

	task = f"""# Evolution Evaluation Task - Generation {generation}

	## Your Mission

	You are analyzing the evolution process for a code optimization task. Your workspace is:
	`{self.memory_dir}`

	## Current Generation

	Generation: {generation}
	Results directory: {gen_dir if gen_dir else 'Not found'}

	## Primary Evaluator (Fixed, DO NOT MODIFY)

	The ground truth evaluation is defined in:
	`{self.primary_evaluator_path}`

	CRITICAL: You MUST NOT modify this file. Read it to understand the primary objective.

	## Your Tasks

	1. READ the primary evaluator to understand the ground truth objective
	2. ANALYZE the current generation's performance and strategy
	3. CREATE auxiliary evaluation metrics that provide insights beyond the primary score
	4. UPDATE EVAL_AGENTS.md with your findings and recommendations

	## Workspace Structure

	Your workspace (`{self.memory_dir}`) should contain:
	- `EVAL_AGENTS.md`: Your accumulated insights and analysis
	- `auxiliary_metrics.py`: Python code for auxiliary metrics
	- Any other analysis files you create

	## Constraints

	- Primary metric is FIXED - you cannot change it
	- Auxiliary metrics should complement, not replace, the primary metric
	- Focus on actionable insights that can guide the evolution process

	## Output

	Update EVAL_AGENTS.md with:
	- Analysis of generation {generation}
	- Auxiliary metric definitions and values
	- Insights and recommendations for future generations

	Begin your analysis!
	"""

	return task

	def _find_generation_dir(self, generation: int) -> Optional[Path]:
	"""Find the generation directory"""
	# Try common patterns
	patterns = [
	self.results_dir / f"gen_{generation}",
	self.results_dir.parent / f"gen_{generation}",
	]

	for pattern in patterns:
	if pattern.exists():
	return pattern

	return None

	async def _run_agent(self, task: str) -> Dict[str, Any]:
	"""
	Run the agent with a task

	This is where we'd integrate async execution if needed
	"""
	# For now, call synchronously (OpenHands is sync)
	# Could wrap in asyncio.to_thread() for true async

	# NOTE: This is simplified - actual OpenHands integration
	# would involve message passing, observation handling, etc.
	# We'll keep it simple for migration

	logger.info(f"📝 Task length: {len(task)} chars")

	# In ev2.py, the agent is run via Agent's API
	# We'll need to properly integrate this

	return {"status": "completed"}

	def _extract_insights(self) -> List[str]:
	"""Extract insights from EVAL_AGENTS.md"""
	eval_agents_md = self.memory_dir / "EVAL_AGENTS.md"

	if not eval_agents_md.exists():
	return []

	insights = []
	content = eval_agents_md.read_text()

	# Simple extraction - look for bullet points
	for line in content.split('\n'):
	if line.strip().startswith('*') or line.strip().startswith('-'):
	insights.append(line.strip())

	return insights[-10:] # Last 10 insights

	def _extract_metrics(self) -> Dict[str, Any]:
	"""Extract auxiliary metrics"""
	auxiliary_py = self.memory_dir / "auxiliary_metrics.py"

	if not auxiliary_py.exists():
	return {}

	# Could dynamically import and execute
	# For now, just check existence
	return {
	"auxiliary_metrics_file_exists": True,
	"file_path": str(auxiliary_py)
	}
	```

	#### 1.2 集成到 Service

	```python
	class EV2ServiceStandalone:
	"""
	Standalone EV2 Service (no dependency on ev2.py)
	"""

	def __init__(self, config: ServiceConfig):
	self.config = config
	self.state = ServiceState(config)

	# Create integrated agent (PERSISTENT)
	self.agent = IntegratedEV2Agent(
	results_dir=config.results_dir,
	primary_evaluator_path=config.primary_evaluator_path,
	config=config.__dict__
	)

	async def handle_generation_notification(self, request: GenerationCompleteRequest):
	"""Handle generation notification"""
	# Decision logic (same as before)
	should_trigger, reason = self.state.should_trigger_agent(...)

	if should_trigger:
	# Call integrated agent (not ev2.py!)
	result = await self.agent.analyze_generation(request.generation)
	return result

	return {"status": "skipped"}
	```

	---

	### Phase 2: 完善 Agent 集成（中等优先级）

	目标：完整实现 OpenHands agent 的交互逻辑

	#### 2.1 消息处理

	从 `ev2.py` 迁移 agent 运行逻辑：

	```python
	async def _run_agent(self, task: str) -> Dict[str, Any]:
	"""
	Run agent with proper message handling

	Migrated from ev2.py (simplified for now)
	"""
	# This is where ev2.py uses Agent API
	# We need to properly integrate:
	# 1. Send task as message
	# 2. Handle agent observations
	# 3. Collect agent actions
	# 4. Wait for completion

	# For MVP, we can use the same approach as ev2.py
	# but with the persistent agent instance

	pass # TODO: Implement based on OpenHands API
	```

	#### 2.2 工作空间管理

	```python
	def _setup_workspace(self):
	"""Setup agent workspace"""
	# Ensure directories exist
	self.memory_dir.mkdir(parents=True, exist_ok=True)

	# Initialize EVAL_AGENTS.md if needed
	eval_md = self.memory_dir / "EVAL_AGENTS.md"
	if not eval_md.exists():
	eval_md.write_text("""# Evaluation Agent Memory

	This document tracks insights and metrics across generations.
	""")
	```

	---

	### Phase 3: 状态管理增强（低优先级）

	目标：为未来的 MetricUnit 等功能做准备

	#### 3.1 MetricRegistry（骨架）

	```python
	class MetricRegistry:
	"""
	Registry for managing metrics

	Prepared for future MetricUnit integration
	"""

	def __init__(self, memory_dir: Path):
	self.memory_dir = memory_dir
	self.metrics = {} # id -> metadata

	def register_metric(self, metric_id: str, metadata: Dict[str, Any]):
	"""Register a metric"""
	self.metrics[metric_id] = metadata

	def list_metrics(self) -> List[Dict[str, Any]]:
	"""List all metrics"""
	return list(self.metrics.values())
	```

	---

	## 📁 文件结构

	```
	eval_agent/
	├── ev2_service_standalone.py # NEW: 完整的独立服务
	├── ev2_service.py # OLD: 保留作为参考
	├── ev2.py # OLD: 保留作为独立工具
	├── ev2_prompt.j2 # SHARED: 系统 prompt
	├── ev2_service_config.yaml # SHARED: 配置文件
	└── test_ev2_service.py # SHARED: 测试脚本
	```

	迁移后：
	- `ev2_service_standalone.py`：生产使用
	- `ev2.py`：保留作为独立命令行工具（可选）
	- `ev2_service.py`：删除或重命名为 `ev2_service_wrapper.py`（存档）

	---

	## 🚀 实施时间表

	### Day 1: 核心迁移（4-6 小时）

	上午：
	- [ ] 创建 `ev2_service_standalone.py` 基础结构
	- [ ] 实现 `IntegratedEV2Agent.__init__` 和 `_create_agent`
	- [ ] 实现 `_build_task_message`

	下午：
	- [ ] 实现 `analyze_generation` 方法
	- [ ] 集成到 FastAPI service
	- [ ] 修复 import 路径问题

	验收：服务能启动，能接收通知，能调用 agent（即使简化版）

	---

	### Day 2: 完善和测试（4-6 小时）

	上午：
	- [ ] 完善 `_run_agent` 方法（如果需要）
	- [ ] 实现结果提取（`_extract_insights`, `_extract_metrics`）
	- [ ] 添加错误处理

	下午：
	- [ ] 完整测试（使用 `test_ev2_service.py`）
	- [ ] 修复发现的问题
	- [ ] 性能优化

	验收：能完整运行一次演化模拟，agent 正确生成输出

	---

	### Day 3: 清理和文档（2-4 小时）

	上午：
	- [ ] 代码清理和重构
	- [ ] 添加详细注释
	- [ ] 更新配置文件

	下午：
	- [ ] 更新文档
	- [ ] 创建使用示例
	- [ ] 准备集成到 ShinkaEvolve

	验收：代码质量高，文档完整，ready for production

	---

	## 📋 迁移 Checklist

	### 从 ev2.py 迁移的内容

	- [ ] Agent 创建逻辑
	- [x] LLM 配置
	- [x] Tools 配置
	- [x] System prompt 加载
	- [ ] Agent 初始化参数

	- [ ] Task 构建逻辑
	- [x] Primary evaluator 路径处理
	- [x] Generation 信息
	- [ ] 额外的 context（如果需要）

	- [ ] Agent 运行逻辑
	- [ ] 消息发送
	- [ ] 观察处理
	- [ ] 结果等待

	- [ ] 结果提取逻辑
	- [x] EVAL_AGENTS.md 解析
	- [x] auxiliary_metrics.py 检测
	- [ ] 更复杂的结果解析（可选）

	- [ ] 工作空间管理
	- [x] Memory 目录创建
	- [ ] 初始文件创建
	- [ ] 清理逻辑（可选）

	### 新增功能

	- [x] HTTP API
	- [x] Generation notification endpoint
	- [x] Status endpoint
	- [x] Manual trigger endpoint

	- [x] 状态管理
	- [x] Generation history
	- [x] Trigger decision logic
	- [x] 持久化

	- [ ] Agent 持久化
	- [ ] Agent instance 复用
	- [ ] Conversation history 累积
	- [ ] Memory 跨代数共享

	### 配置和部署

	- [x] 配置文件
	- [x] Service 配置
	- [x] Trigger 策略配置
	- [ ] Agent 参数配置

	- [ ] 测试
	- [x] 基础功能测试
	- [ ] 集成测试
	- [ ] 性能测试

	- [ ] 文档
	- [x] API 文档
	- [ ] 迁移文档
	- [ ] 使用指南

	---

	## 🎯 迁移的关键挑战

	### Challenge 1: OpenHands Agent 交互

	问题：`ev2.py` 使用 OpenHands 的特定 API，需要理解其工作方式

	解决方案：
	- 先保持简化版本（调用 agent，等待完成）
	- 逐步完善（如果需要更精细的控制）
	- 参考 `ev2.py` 的实现

	### Challenge 2: Agent 状态持久化

	问题：每次调用是否需要保持 agent 的上下文？

	解决方案：
	- Short-term：每次创建新 agent（像 ev2.py 一样）
	- Long-term：复用 agent instance，累积 conversation history

	### Challenge 3: 错误处理

	问题：Agent 可能失败，如何优雅处理？

	解决方案：
	- Try-catch 包装 agent 调用
	- 记录详细错误日志
	- 返回有意义的错误信息
	- Service 继续运行（不崩溃）

	---

	## 💡 简化策略

	为了快速完成迁移，建议采用渐进式策略：

	### MVP 版本（最小可行）

	目标：用最少的改动让服务工作

	简化点：
	1. Agent 运行：直接调用，不追求最优性能
	2. 结果提取：简单解析（像现在一样）
	3. 状态管理：基础版本即可

	时间：1 天

	### 增强版本（生产就绪）

	目标：优化性能和用户体验

	增强点：
	1. Agent 持久化：复用 agent instance
	2. 更好的结果解析：提取更多信息
	3. 错误恢复：健壮的错误处理

	时间：+1 天

	### 完整版本（未来扩展）

	目标：为高级功能做准备

	扩展点：
	1. MetricUnit 集成
	2. Lifecycle 管理
	3. 异步 Meta-cognition

	时间：+1-2 周（按需）

	---

	## 📊 对比：迁移前 vs 迁移后

	\| 方面 \| 迁移前 (wrapper) \| 迁移后 (standalone) \|
	\|------\|-----------------\|---------------------\|
	\| 依赖 \| 依赖 ev2.py \| 完全独立 \|
	\| 架构 \| 两层 \| 单层 \|
	\| 状态 \| 分散 \| 集中 \|
	\| Agent \| 每次创建 \| 可持久化 \|
	\| 扩展性 \| 受限 \| 高 \|
	\| 维护性 \| 中等 \| 高 \|
	\| 性能 \| 有开销 \| 优化 \|
	\| 代码行数 \| ~700 \| ~800-1000 \|

	---

	## ✅ 验收标准

	迁移完成的标准：

	1. 功能完整性
	- [ ] 所有 ev2.py 的功能都保留
	- [ ] HTTP API 正常工作
	- [ ] 状态持久化正常
	- [ ] Agent 能正确运行

	2. 测试通过
	- [ ] `test_ev2_service.py` 全部通过
	- [ ] 模拟 25 代演化成功
	- [ ] Agent 生成 EVAL_AGENTS.md 和 auxiliary_metrics.py

	3. 代码质量
	- [ ] 无 linter 错误
	- [ ] 有充分的注释
	- [ ] 结构清晰

	4. 文档完整
	- [ ] API 文档更新
	- [ ] 使用指南更新
	- [ ] 迁移说明清晰

	---

	## 🚀 立即开始

	### 第一步（今天，30 分钟）

	1. 创建 `ev2_service_standalone.py` 骨架
	2. 复制 `ev2_service.py` 的 HTTP 部分
	3. 创建 `IntegratedEV2Agent` 类骨架

	### 第二步（明天上午，2-3 小时）

	1. 从 `ev2.py` 迁移核心逻辑到 `IntegratedEV2Agent`
	2. 实现 `_create_agent` 和 `_build_task_message`
	3. 简化版的 `analyze_generation`

	### 第三步（明天下午，2-3 小时）

	1. 完整测试
	2. 修复问题
	3. 文档更新

	---

	Ready to start? 我可以帮你创建 `ev2_service_standalone.py` 的骨架！🚀