| # EV2 Migration Verification |
|
|
| ## β
Migration Complete! |
|
|
| Successfully migrated from `ev2_service.py` (wrapper) to `ev2_service_standalone.py` (integrated). |
|
|
| ### π Migration Summary |
|
|
| | Component | ev2.py Location | ev2_service_standalone.py Location | Status | |
| |-----------|----------------|-----------------------------------|--------| |
| | **LLM Creation** | Lines 54-58 | `IntegratedEV2Agent._create_llm()` | β
Exact replica | |
| | **Agent Creation** | Lines 60-73 | `IntegratedEV2Agent._create_agent()` | β
Exact replica | |
| | **Task Building** | Lines 104-204 | `IntegratedEV2Agent._build_task_message()` | β
Exact replica | |
| | **Conversation** | Line 76 | `analyze_generation()` | β
Same API usage | |
| | **Send/Run** | Lines 85-91 | `analyze_generation()` | β
Same API usage | |
| | **Workspace** | Line 41 | `__init__()` | β
Same path logic | |
| | **Error Handling** | Lines 130-136 | `_build_task_message()` | β
Same try-except | |
| | **Print Logs** | Lines 44-100 | Converted to `logging` | β
More professional | |
|
|
| ### π Key Differences (Improvements) |
|
|
| 1. **Agent Lifecycle**: Agent instance can be reused (no recreation each time) |
| 2. **State Management**: Integrated with service state |
| 3. **Logging**: Uses Python logging instead of print |
| 4. **Error Handling**: More robust, service doesn't crash |
| 5. **Configuration**: Unified config system |
|
|
| ### π― What Was Preserved (100% Compatibility) |
|
|
| 1. β
**Exact same LLM configuration** (model, api_key, base_url from env vars) |
| 2. β
**Exact same tools** (Terminal, FileEditor, TaskTracker) |
| 3. β
**Exact same prompt template** (ev2_prompt.j2) |
| 4. β
**Exact same task message format** (all text, structure preserved) |
| 5. β
**Exact same workspace path** (results_dir/eval_agent_memory) |
| 6. β
**Exact same file generation** (EVAL_AGENTS.md, auxiliary_metrics.py) |
| 7. β
**Exact same Conversation API usage** |
|
|
| ### π§ͺ Testing Checklist |
|
|
| - [ ] Service starts without errors |
| - [ ] Agent initialization successful |
| - [ ] Generation notifications work |
| - [ ] Agent triggers at correct intervals |
| - [ ] Agent generates EVAL_AGENTS.md |
| - [ ] Agent generates auxiliary_metrics.py |
| - [ ] Service state persists correctly |
| - [ ] Manual trigger works |
| - [ ] Error handling works (graceful failures) |
|
|
| --- |
|
|
| ## π Testing Instructions |
|
|
| ### Step 1: Start the Standalone Service |
|
|
| ```bash |
| cd /home/tengxiao/pj/ShinkaEvolve |
| |
| # Make sure old service is stopped |
| pkill -f "ev2_service" |
| |
| # Start new standalone service |
| python eval_agent/ev2_service_standalone.py \ |
| --config eval_agent/ev2_service_config.yaml |
| ``` |
|
|
| **Expected output**: |
| ``` |
| ================================================================================ |
| β
IntegratedEV2Agent Initialized |
| ================================================================================ |
| Results Dir: /path/to/results |
| Workspace: /path/to/results/eval_agent_memory |
| Primary Evaluator: /path/to/evaluate_ori.py |
| ================================================================================ |
| π€ Creating LLM: vertex_ai/gemini-2.5-flash |
| π Loading prompt: /path/to/ev2_prompt.j2 |
| β
Agent created |
| β
Integrated EV2 Agent ready |
| ================================================================================ |
| β
Service Started |
| Experiment: circle_packing_NO_vision |
| Results dir: ... |
| Trigger mode: periodic |
| Trigger interval: 10 |
| ================================================================================ |
| INFO: Uvicorn running on http://0.0.0.0:8765 |
| ``` |
| |
| ### Step 2: Test Service Status |
| |
| ```bash |
| # In another terminal |
| cd /home/tengxiao/pj/ShinkaEvolve |
|
|
| python eval_agent/test_ev2_service.py \ |
| --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ |
| --test-mode status |
| ``` |
| |
| **Expected output**: |
| ``` |
| π Testing service status... |
| β
Service is running! |
| Uptime: X.Xs |
| Trigger mode: periodic |
| Trigger interval: 10 |
| ``` |
| |
| ### Step 3: Simulate Evolution (Small Test) |
| |
| ```bash |
| # Test with just 12 generations (will trigger once at gen 10) |
| python eval_agent/test_ev2_service.py \ |
| --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ |
| --test-mode simulate \ |
| --num-gens 12 |
| ``` |
| |
| **Expected behavior**: |
| ``` |
| Gen 0-9: β SKIP (fast, ~0.1s each) |
| Gen 10: β TRIGGER (slow, ~60-240s, agent runs) |
| Gen 11: β SKIP (fast) |
| ``` |
|
|
| **Check outputs**: |
| ```bash |
| # Check service state |
| cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json |
| |
| # Should show: |
| # - total_notifications: 12 |
| # - total_agent_runs: 1 |
| # - last_agent_trigger_gen: 10 |
| |
| # Check agent outputs |
| ls -la examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/ |
| |
| # Should have: |
| # - EVAL_AGENTS.md (updated) |
| # - auxiliary_metrics.py (created/updated) |
| # - service_state.json (new) |
| ``` |
|
|
| ### Step 4: Verify Agent Output Quality |
|
|
| ```bash |
| # Check that EVAL_AGENTS.md has new content |
| tail -50 examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/EVAL_AGENTS.md |
| |
| # Check that auxiliary_metrics.py is valid Python |
| python -m py_compile examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/auxiliary_metrics.py |
| ``` |
|
|
| ### Step 5: Test Manual Trigger |
|
|
| ```bash |
| curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=5" |
| ``` |
|
|
| Should trigger agent for generation 5 (if it exists in history). |
|
|
| --- |
|
|
| ## π Troubleshooting |
|
|
| ### Issue: "Agent not initialized" |
|
|
| **Symptom**: Service starts but agent triggers fail |
|
|
| **Check**: |
| ```bash |
| # Look for this in startup logs: |
| # β Failed to initialize agent: ... |
| ``` |
|
|
| **Common causes**: |
| 1. Primary evaluator path wrong β Check `primary_evaluator` in config |
| 2. LLM config wrong β Check env vars: `LLM_MODEL`, `LLM_API_KEY` |
| 3. ev2_prompt.j2 missing β Check file exists in eval_agent/ |
|
|
| **Fix**: |
| ```bash |
| # Verify primary evaluator exists |
| ls -la examples/circle_packing/evaluate_ori.py |
| |
| # Verify prompt exists |
| ls -la eval_agent/ev2_prompt.j2 |
| |
| # Check LLM env vars |
| echo $LLM_MODEL |
| echo $LLM_API_KEY |
| ``` |
|
|
| ### Issue: Agent runs but produces no output |
|
|
| **Symptom**: Agent completes but EVAL_AGENTS.md is empty or not updated |
| |
| **Check**: |
| 1. Workspace permissions |
| 2. Agent logs (look for errors during run) |
| 3. LLM API connectivity |
| |
| ### Issue: Service crashes on agent trigger |
| |
| **Symptom**: Service stops when trying to run agent |
| |
| **Check**: |
| 1. Look at full error traceback |
| 2. Check if OpenHands SDK version is compatible |
| 3. Verify all dependencies installed |
| |
| --- |
| |
| ## β
Success Criteria |
| |
| The migration is successful if: |
| |
| 1. β
Service starts without errors |
| 2. β
Agent initializes (no "Agent not initialized" errors) |
| 3. β
Agent triggers at correct generations (10, 20, 30...) |
| 4. β
Agent generates EVAL_AGENTS.md with meaningful content |
| 5. β
Agent generates auxiliary_metrics.py with valid Python code |
| 6. β
Service state persists across notifications |
| 7. β
No crashes or fatal errors during agent runs |
| |
| --- |
| |
| ## π Next Steps After Verification |
| |
| Once all tests pass: |
| |
| 1. **Update documentation** to point to standalone version |
| 2. **Archive old version**: Rename `ev2_service.py` to `ev2_service_wrapper_old.py` |
| 3. **Update test scripts** to use standalone by default |
| 4. **Integrate with ShinkaEvolve**: Add notification code to EvolutionRunner |
| 5. **Production deployment**: Add systemd service, monitoring, etc. |
| |
| --- |
| |
| ## π Migration Benefits |
| |
| ### Performance |
| - β
Agent can be reused (no recreation overhead) |
| - β
Faster startup (agent pre-initialized) |
| |
| ### Maintainability |
| - β
Single codebase (no wrapper layer) |
| - β
Clearer architecture |
| - β
Easier to debug |
| |
| ### Extensibility |
| - β
Ready for MetricUnit integration |
| - β
Ready for Lifecycle management |
| - β
Ready for async meta-cognition |
| |
| ### Reliability |
| - β
Better error handling |
| - β
Doesn't depend on subprocess calls |
| - β
Unified state management |
| |