shinka-backup / docs /eval_service_unified_api.md

Add files using upload-large-folder tool

1556404 verified 23 days ago

10.2 kB

	# Eval Service 统一 API 文档

	## 📋 概述

	`ev2_service_standalone.py` 现已支持统一的接口，可以自动判断工作模式：

	1. 评估模式（新）: 提供 `code_path` + `evaluator_module` → Service 执行评估
	2. 通知模式（旧）: 仅提供 `primary_score` → 向后兼容旧代码

	## 🚀 启动服务

	```bash
	python eval_agent/ev2_service_standalone.py \
	--results-dir /path/to/experiment \
	--primary-evaluator examples/circle_packing/evaluate.py \
	--trigger-mode periodic \
	--trigger-interval 10 \
	--port 8765
	```

	或使用配置文件：

	```bash
	python eval_agent/ev2_service_standalone.py --config config.yaml
	```

	## 📡 API Endpoints

	### 1. Generation Complete (统一入口)

	`POST /api/v1/notify/generation_complete`

	#### 模式 1: 评估模式 (NEW)

	请求：
	```json
	{
	"generation": 10,
	"code_path": "/path/to/gen_10/main.py",
	"results_dir": "/path/to/gen_10/results",
	"evaluator_module": "examples.circle_packing.evaluate_ori",
	"evaluator_function": "main",
	"evaluator_kwargs": {}
	}
	```

	响应 (立即返回，< 100ms):
	```json
	{
	"status": "accepted",
	"generation": 10,
	"job_id": "eval_10_1738512345",
	"estimated_time": 15.0,
	"agent_triggered": false,
	"trigger_reason": "Will be determined after evaluation",
	"processing_time_ms": 50.2
	}
	```

	后台执行：
	1. 运行 primary evaluator
	2. 运行 auxiliary evaluators（如果存在）
	3. 保存 `metrics.json`
	4. 决定是否触发 Agent
	5. 如果触发：运行 EV2 Agent 分析

	#### 模式 2: 通知模式 (LEGACY - 向后兼容)

	请求：
	```json
	{
	"generation": 10,
	"results_dir": "/path/to/gen_10/results",
	"primary_score": 0.85
	}
	```

	响应 (同步):
	```json
	{
	"status": "completed",
	"generation": 10,
	"job_id": null,
	"agent_triggered": false,
	"trigger_reason": "Not yet (last trigger at gen 0)",
	"processing_time_ms": 5.1
	}
	```

	### 2. 查询 Generation 状态

	`GET /api/v1/generation/{generation}/status`

	响应：
	```json
	{
	"generation": 10,
	"job_id": "eval_10_1738512345",
	"status": "running", // "pending" \| "running" \| "completed" \| "failed"
	"created_at": 1738512345.0,
	"elapsed_time": 5.2
	}
	```

	如果已完成：
	```json
	{
	"generation": 10,
	"job_id": "eval_10_1738512345",
	"status": "completed",
	"created_at": 1738512345.0,
	"completed_at": 1738512360.0,
	"elapsed_time": 15.0,
	"result": {
	"combined_score": 0.85,
	"primary": {...},
	"auxiliary": {...},
	"timestamp": 1738512360.0
	}
	}
	```

	### 3. 查询 Job 状态

	`GET /api/v1/evaluate/{job_id}`

	响应：同上

	### 4. 服务状态

	`GET /api/v1/status`

	响应：
	```json
	{
	"status": "running",
	"uptime_seconds": 12345.6,
	"version": "2.0.0-standalone",
	"experiment": {
	"name": "circle-packing",
	"results_dir": "/path/to/experiment",
	"primary_evaluator": "examples/circle_packing/evaluate.py"
	},
	"statistics": {
	"total_notifications": 20,
	"total_agent_runs": 2,
	"generations_tracked": 20,
	"last_agent_trigger_gen": 10
	},
	"config": {
	"trigger_mode": "periodic",
	"trigger_interval": 10,
	"agent_enabled": true,
	"agent_initialized": true
	}
	}
	```

	## 📊 使用示例

	### Python 客户端

	```python
	import requests
	import time

	SERVICE_URL = "http://localhost:8765"

	# 评估模式
	def submit_evaluation(generation, code_path):
	response = requests.post(
	f"{SERVICE_URL}/api/v1/notify/generation_complete",
	json={
	"generation": generation,
	"code_path": code_path,
	"results_dir": f"/path/to/gen_{generation}/results",
	"evaluator_module": "examples.circle_packing.evaluate_ori",
	"evaluator_function": "main"
	},
	timeout=5.0
	)

	data = response.json()
	job_id = data['job_id']

	print(f"Submitted: job_id={job_id}")

	# 轮询状态
	while True:
	status_response = requests.get(
	f"{SERVICE_URL}/api/v1/generation/{generation}/status"
	)

	status_data = status_response.json()

	if status_data['status'] == 'completed':
	result = status_data['result']
	return result['combined_score']
	elif status_data['status'] == 'failed':
	raise RuntimeError(f"Evaluation failed: {status_data.get('error')}")

	time.sleep(2)
	```

	### ShinkaEvolve 集成

	修改 `shinka/core/runner.py`：

	```python
	class EvolutionConfig:
	eval_service_url: Optional[str] = None
	use_eval_service: bool = False
	evaluator_module: Optional[str] = None

	class EvolutionRunner:
	def _submit_new_job(self):
	# ... 生成代码 ...

	if self.eval_service_url and self.evo_config.use_eval_service:
	# 使用 Eval Service
	job_id = self._submit_to_eval_service(
	generation=current_gen,
	code_path=str(exec_fname),
	results_dir=str(results_dir)
	)
	else:
	# 旧方式
	job_id = self.scheduler.submit_async(exec_fname, results_dir)

	running_job = RunningJob(
	job_id=job_id,
	use_eval_service=self.evo_config.use_eval_service,
	...
	)

	def _submit_to_eval_service(self, generation, code_path, results_dir):
	response = requests.post(
	f"{self.eval_service_url}/api/v1/notify/generation_complete",
	json={
	"generation": generation,
	"code_path": code_path,
	"results_dir": results_dir,
	"evaluator_module": self.evo_config.evaluator_module
	},
	timeout=5.0
	)

	return response.json()['job_id']

	def _check_completed_jobs(self):
	for job in self.running_jobs:
	if job.use_eval_service:
	# 查询 eval service
	response = requests.get(
	f"{self.eval_service_url}/api/v1/generation/{job.generation}/status"
	)

	if response.json()['status'] == 'completed':
	completed.append(job)
	```

	## 🧪 测试

	运行测试脚本：

	```bash
	# 1. 启动服务
	python eval_agent/ev2_service_standalone.py \
	--results-dir /tmp/test \
	--primary-evaluator examples/circle_packing/evaluate.py

	# 2. 运行测试
	python test_eval_service_unified.py
	```

	测试覆盖：
	- ✅ 服务健康检查
	- ✅ 通知模式（向后兼容）
	- ✅ 评估模式（异步）
	- ✅ 状态查询（按 generation 和 job_id）

	## 🔧 工作流程

	### 评估模式完整流程

	```
	1. ShinkaEvolve 生成代码
	↓
	2. POST /api/v1/notify/generation_complete
	{
	generation: 10,
	code_path: "gen_10/main.py",
	evaluator_module: "examples.circle_packing.evaluate"
	}
	↓
	3. 立即返回 (< 100ms)
	{
	status: "accepted",
	job_id: "eval_10_..."
	}
	↓
	4. Eval Service 后台执行：
	- 运行 primary evaluator → combined_score
	- 运行 auxiliary evaluators → {diversity, ...}
	- 保存 metrics.json
	- 决定是否触发 Agent
	- 如果触发：运行 EV2 Agent 分析
	↓
	5. ShinkaEvolve 轮询：
	GET /api/v1/generation/10/status
	→ status: "running"
	→ status: "running"
	→ status: "completed", result: {...}
	↓
	6. ShinkaEvolve 获取 combined_score，继续下一代
	```

	## ⚙️ 配置

	### Evaluator Contract

	任何任务的 evaluator 必须满足：

	```python
	def evaluate(code_path: str, **kwargs) -> Dict[str, Any]:
	"""
	评估函数约定

	Args:
	code_path: 生成的代码路径
	**kwargs: 额外参数

	Returns:
	{
	"combined_score": float, # 必需
	"metrics": Dict[str, Any], # 可选
	"metadata": Dict[str, Any] # 可选
	}
	"""
	# 运行代码
	result = run_code(code_path)

	# 计算分数
	score = compute_score(result)

	return {
	"combined_score": score,
	"metrics": {"coverage": 0.8},
	"metadata": {"num_items": 100}
	}
	```

	### Auxiliary Metrics

	Agent 生成的 auxiliary metrics 位于：
	```
	experiment_root/
	└── eval_agent_memory/
	└── auxiliary_metrics.py # Agent 生成
	```

	约定：所有以 `evaluate_` 开头的函数都会被自动调用：

	```python
	def evaluate_diversity(code_path: str, primary_result: Dict) -> Dict[str, Any]:
	"""多样性指标"""
	return {"diversity_score": 0.7}

	def evaluate_robustness(code_path: str, primary_result: Dict) -> Dict[str, Any]:
	"""鲁棒性指标"""
	return {"robustness_score": 0.8}
	```

	## 📝 迁移指南

	### 从旧模式迁移到新模式

	旧代码（ShinkaEvolve 自己评估）：
	```python
	# 1. 生成代码
	# 2. 运行评估
	combined_score = evaluate(code_path)
	# 3. 通知 service
	requests.post(url, json={
	"generation": gen,
	"primary_score": combined_score
	})
	```

	新代码（Eval Service 负责评估）：
	```python
	# 1. 生成代码
	# 2. 提交到 service（不等待）
	response = requests.post(url, json={
	"generation": gen,
	"code_path": code_path,
	"evaluator_module": "examples.task.evaluate"
	})
	job_id = response.json()['job_id']

	# 3. 轮询状态
	while True:
	status = requests.get(f"{url}/generation/{gen}/status")
	if status.json()['status'] == 'completed':
	combined_score = status.json()['result']['combined_score']
	break
	```

	## 🎯 优势

	1. 统一接口: 一个 endpoint 处理所有情况
	2. 自动判断: 根据参数自动选择模式
	3. 向后兼容: 旧代码无需修改
	4. 职责清晰: 评估由 Service 统一管理
	5. 异步高效: 立即返回，不阻塞
	6. 并发支持: 可同时处理多个评估

	## 📊 性能

	- 提交请求: < 100ms
	- 评估执行: 10-30秒（取决于 evaluator）
	- 状态查询: < 10ms
	- 并发支持: 可同时处理多个 generation 的评估