EV2 Migration Verification
β Migration Complete!
Successfully migrated from ev2_service.py (wrapper) to ev2_service_standalone.py (integrated).
π Migration Summary
| Component | ev2.py Location | ev2_service_standalone.py Location | Status |
|---|---|---|---|
| LLM Creation | Lines 54-58 | IntegratedEV2Agent._create_llm() |
β Exact replica |
| Agent Creation | Lines 60-73 | IntegratedEV2Agent._create_agent() |
β Exact replica |
| Task Building | Lines 104-204 | IntegratedEV2Agent._build_task_message() |
β Exact replica |
| Conversation | Line 76 | analyze_generation() |
β Same API usage |
| Send/Run | Lines 85-91 | analyze_generation() |
β Same API usage |
| Workspace | Line 41 | __init__() |
β Same path logic |
| Error Handling | Lines 130-136 | _build_task_message() |
β Same try-except |
| Print Logs | Lines 44-100 | Converted to logging |
β More professional |
π Key Differences (Improvements)
- Agent Lifecycle: Agent instance can be reused (no recreation each time)
- State Management: Integrated with service state
- Logging: Uses Python logging instead of print
- Error Handling: More robust, service doesn't crash
- Configuration: Unified config system
π― What Was Preserved (100% Compatibility)
- β Exact same LLM configuration (model, api_key, base_url from env vars)
- β Exact same tools (Terminal, FileEditor, TaskTracker)
- β Exact same prompt template (ev2_prompt.j2)
- β Exact same task message format (all text, structure preserved)
- β Exact same workspace path (results_dir/eval_agent_memory)
- β Exact same file generation (EVAL_AGENTS.md, auxiliary_metrics.py)
- β Exact same Conversation API usage
π§ͺ Testing Checklist
- Service starts without errors
- Agent initialization successful
- Generation notifications work
- Agent triggers at correct intervals
- Agent generates EVAL_AGENTS.md
- Agent generates auxiliary_metrics.py
- Service state persists correctly
- Manual trigger works
- Error handling works (graceful failures)
π Testing Instructions
Step 1: Start the Standalone Service
cd /home/tengxiao/pj/ShinkaEvolve
# Make sure old service is stopped
pkill -f "ev2_service"
# Start new standalone service
python eval_agent/ev2_service_standalone.py \
--config eval_agent/ev2_service_config.yaml
Expected output: ```
β IntegratedEV2Agent Initialized
Results Dir: /path/to/results Workspace: /path/to/results/eval_agent_memory Primary Evaluator: /path/to/evaluate_ori.py
π€ Creating LLM: vertex_ai/gemini-2.5-flash π Loading prompt: /path/to/ev2_prompt.j2 β Agent created β Integrated EV2 Agent ready
β Service Started Experiment: circle_packing_NO_vision Results dir: ... Trigger mode: periodic Trigger interval: 10
INFO: Uvicorn running on http://0.0.0.0:8765
### Step 2: Test Service Status
```bash
# In another terminal
cd /home/tengxiao/pj/ShinkaEvolve
python eval_agent/test_ev2_service.py \
--results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
--test-mode status
Expected output:
π Testing service status...
β
Service is running!
Uptime: X.Xs
Trigger mode: periodic
Trigger interval: 10
Step 3: Simulate Evolution (Small Test)
# Test with just 12 generations (will trigger once at gen 10)
python eval_agent/test_ev2_service.py \
--results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
--test-mode simulate \
--num-gens 12
Expected behavior:
Gen 0-9: β SKIP (fast, ~0.1s each)
Gen 10: β TRIGGER (slow, ~60-240s, agent runs)
Gen 11: β SKIP (fast)
Check outputs:
# Check service state
cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json
# Should show:
# - total_notifications: 12
# - total_agent_runs: 1
# - last_agent_trigger_gen: 10
# Check agent outputs
ls -la examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/
# Should have:
# - EVAL_AGENTS.md (updated)
# - auxiliary_metrics.py (created/updated)
# - service_state.json (new)
Step 4: Verify Agent Output Quality
# Check that EVAL_AGENTS.md has new content
tail -50 examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/EVAL_AGENTS.md
# Check that auxiliary_metrics.py is valid Python
python -m py_compile examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/auxiliary_metrics.py
Step 5: Test Manual Trigger
curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=5"
Should trigger agent for generation 5 (if it exists in history).
π Troubleshooting
Issue: "Agent not initialized"
Symptom: Service starts but agent triggers fail
Check:
# Look for this in startup logs:
# β Failed to initialize agent: ...
Common causes:
- Primary evaluator path wrong β Check
primary_evaluatorin config - LLM config wrong β Check env vars:
LLM_MODEL,LLM_API_KEY - ev2_prompt.j2 missing β Check file exists in eval_agent/
Fix:
# Verify primary evaluator exists
ls -la examples/circle_packing/evaluate_ori.py
# Verify prompt exists
ls -la eval_agent/ev2_prompt.j2
# Check LLM env vars
echo $LLM_MODEL
echo $LLM_API_KEY
Issue: Agent runs but produces no output
Symptom: Agent completes but EVAL_AGENTS.md is empty or not updated
Check:
- Workspace permissions
- Agent logs (look for errors during run)
- LLM API connectivity
Issue: Service crashes on agent trigger
Symptom: Service stops when trying to run agent
Check:
- Look at full error traceback
- Check if OpenHands SDK version is compatible
- Verify all dependencies installed
β Success Criteria
The migration is successful if:
- β Service starts without errors
- β Agent initializes (no "Agent not initialized" errors)
- β Agent triggers at correct generations (10, 20, 30...)
- β Agent generates EVAL_AGENTS.md with meaningful content
- β Agent generates auxiliary_metrics.py with valid Python code
- β Service state persists across notifications
- β No crashes or fatal errors during agent runs
π Next Steps After Verification
Once all tests pass:
- Update documentation to point to standalone version
- Archive old version: Rename
ev2_service.pytoev2_service_wrapper_old.py - Update test scripts to use standalone by default
- Integrate with ShinkaEvolve: Add notification code to EvolutionRunner
- Production deployment: Add systemd service, monitoring, etc.
π Migration Benefits
Performance
- β Agent can be reused (no recreation overhead)
- β Faster startup (agent pre-initialized)
Maintainability
- β Single codebase (no wrapper layer)
- β Clearer architecture
- β Easier to debug
Extensibility
- β Ready for MetricUnit integration
- β Ready for Lifecycle management
- β Ready for async meta-cognition
Reliability
- β Better error handling
- β Doesn't depend on subprocess calls
- β Unified state management