| # Step 1: Testing Guide for EV2 Service |
|
|
| ## π― What We Built |
|
|
| A **minimal HTTP service wrapper** around `ev2.py` that: |
| - β
Receives generation completion notifications |
| - β
Autonomously decides when to trigger EV2 agent |
| - β
Maintains persistent state across generations |
| - β
Requires minimal changes to ShinkaEvolve |
|
|
| ## π File Overview |
|
|
| ``` |
| eval_agent/ |
| βββ ev2_service.py # The HTTP service (NEW) |
| βββ ev2_service_config.yaml # Configuration file (NEW) |
| βββ test_ev2_service.py # Test script (NEW) |
| βββ ev2.py # Original agent logic (UNCHANGED) |
| βββ ev2_prompt.j2 # Agent prompt (UNCHANGED) |
| ``` |
|
|
| ## π Step-by-Step Testing |
|
|
| ### Step 1: Install Dependencies |
|
|
| ```bash |
| cd /home/tengxiao/pj/ShinkaEvolve |
| source venv/bin/activate |
| |
| # Install FastAPI and Uvicorn |
| pip install fastapi uvicorn pyyaml |
| ``` |
|
|
| ### Step 2: Configure the Service |
|
|
| Edit `eval_agent/ev2_service_config.yaml` if needed: |
|
|
| ```yaml |
| experiment: |
| results_dir: "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" |
| primary_evaluator: "examples/circle_packing/evaluate_ori.py" |
| |
| strategy: |
| trigger_mode: "periodic" # Options: always, periodic, plateau, mixed |
| trigger_interval: 10 # Run agent every 10 generations |
| ``` |
|
|
| **Trigger Modes:** |
| - `always`: Run agent every generation (for testing) |
| - `periodic`: Run every N generations |
| - `plateau`: Run when score plateaus |
| - `mixed`: Run on periodic OR plateau (whichever comes first) |
|
|
| ### Step 3: Start the Service |
|
|
| **Terminal 1** (Service): |
|
|
| ```bash |
| cd /home/tengxiao/pj/ShinkaEvolve |
| source venv/bin/activate |
| |
| # Start the service |
| python eval_agent/ev2_service.py --config eval_agent/ev2_service_config.yaml |
| ``` |
|
|
| Expected output: |
| ``` |
| INFO: Started server process [12345] |
| INFO: Waiting for application startup. |
| 2026-02-02 15:30:00 - __main__ - INFO - π Starting EV2 Evaluation Service... |
| 2026-02-02 15:30:00 - __main__ - INFO - β
Service started |
| 2026-02-02 15:30:00 - __main__ - INFO - Experiment: circle_packing_NO_vision |
| 2026-02-02 15:30:00 - __main__ - INFO - Trigger mode: periodic |
| 2026-02-02 15:30:00 - __main__ - INFO - Trigger interval: 10 |
| INFO: Application startup complete. |
| INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit) |
| ``` |
|
|
| ### Step 4: Test the Service |
|
|
| **Terminal 2** (Test): |
|
|
| ```bash |
| cd /home/tengxiao/pj/ShinkaEvolve |
| source venv/bin/activate |
| |
| # Test 1: Check service status |
| python eval_agent/test_ev2_service.py \ |
| --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ |
| --test-mode status |
| ``` |
|
|
| Expected output: |
| ``` |
| π Testing service status... |
| β
Service is running! |
| Uptime: 12.3s |
| Trigger mode: periodic |
| Trigger interval: 10 |
| ``` |
|
|
| ```bash |
| # Test 2: Simulate evolution (25 generations) |
| python eval_agent/test_ev2_service.py \ |
| --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ |
| --test-mode simulate \ |
| --num-gens 25 |
| ``` |
|
|
| Expected output: |
| ``` |
| 𧬠Simulating evolution with 25 generations... |
| ====================================================================== |
| |
| π€ Sending notification: gen=0, score=2.4000 |
| Status: skipped |
| Agent triggered: False |
| Reason: Not yet (last trigger at gen -1) |
| Processing time: 5.2ms |
| |
| π€ Sending notification: gen=1, score=2.4050 |
| Status: skipped |
| Agent triggered: False |
| Reason: Not yet (last trigger at gen -1) |
| Processing time: 3.1ms |
| |
| ... |
| |
| π€ Sending notification: gen=10, score=2.4500 |
| Status: success |
| Agent triggered: True |
| Reason: Periodic trigger (interval=10) |
| Processing time: 15234.5ms |
| Insights: 3 found |
| |
| ... |
| |
| π€ Sending notification: gen=20, score=2.4950 |
| Status: success |
| Agent triggered: True |
| Reason: Periodic trigger (interval=10) |
| Processing time: 12456.7ms |
| Insights: 3 found |
| |
| ====================================================================== |
| β
Simulation complete! |
| ``` |
|
|
| ### Step 5: Check Results |
|
|
| The service creates/updates: |
|
|
| ``` |
| examples/circle_packing/results/.../ |
| βββ eval_agent_memory/ |
| βββ EVAL_AGENTS.md # Updated by agent |
| βββ auxiliary_metrics.py # Created by agent |
| βββ service_state.json # Service state (NEW) |
| ``` |
|
|
| Check service state: |
| ```bash |
| cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json |
| ``` |
|
|
| ### Step 6: Test Manual Trigger (Optional) |
|
|
| ```bash |
| # Manually trigger agent for generation 5 |
| python eval_agent/test_ev2_service.py \ |
| --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ |
| --test-mode manual \ |
| --generation 5 |
| ``` |
|
|
| ## π API Documentation |
|
|
| The service provides these endpoints: |
|
|
| ### 1. Generation Notification (Main) |
|
|
| ```bash |
| curl -X POST http://localhost:8765/api/v1/notify/generation_complete \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "generation": 42, |
| "results_dir": "/path/to/results", |
| "primary_score": 2.5407 |
| }' |
| ``` |
|
|
| Response: |
| ```json |
| { |
| "status": "success", |
| "message": "Periodic trigger (interval=10)", |
| "generation": 42, |
| "agent_triggered": true, |
| "trigger_reason": "Periodic trigger (interval=10)", |
| "insights": ["..."], |
| "auxiliary_metrics": {...}, |
| "processing_time_ms": 15234.5 |
| } |
| ``` |
|
|
| ### 2. Service Status |
|
|
| ```bash |
| curl http://localhost:8765/api/v1/status |
| ``` |
|
|
| ### 3. Manual Trigger |
|
|
| ```bash |
| curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=10" |
| ``` |
|
|
| ### 4. Interactive Docs |
|
|
| Open in browser: http://localhost:8765/docs |
|
|
| ## π§ Integration with ShinkaEvolve |
|
|
| To integrate with ShinkaEvolve, add this to `EvolutionRunner`: |
|
|
| ```python |
| # shinka/core/runner.py |
| |
| class EvolutionRunner: |
| def __init__(self, config: EvolutionConfig): |
| self.config = config |
| |
| # Initialize eval service client (optional) |
| self.eval_service_url = config.eval_service_url if hasattr(config, 'eval_service_url') else None |
| |
| def _evaluate_generation(self, generation: int, code_path: str, results_dir: str): |
| # Run normal evaluation (unchanged) |
| results, score = self.scheduler.run(code_path, results_dir) |
| |
| # Notify eval service (NEW, non-blocking) |
| if self.eval_service_url: |
| try: |
| import requests |
| requests.post( |
| f"{self.eval_service_url}/api/v1/notify/generation_complete", |
| json={ |
| "generation": generation, |
| "results_dir": results_dir, |
| "primary_score": score |
| }, |
| timeout=1 # Short timeout, fire-and-forget |
| ) |
| except Exception as e: |
| self.logger.warning(f"Eval service notification failed: {e}") |
| # Continue regardless |
| |
| return results, score |
| ``` |
|
|
| **Changes required**: ~10 lines of code! |
|
|
| ## π Service Decision Logic |
|
|
| The service decides autonomously when to trigger the agent: |
|
|
| ```python |
| Generation 0: score=2.40 β SKIP (not yet, interval=10) |
| Generation 1: score=2.41 β SKIP |
| ... |
| Generation 10: score=2.45 β TRIGGER (periodic, interval=10) β
|
| Generation 11: score=2.46 β SKIP |
| ... |
| Generation 20: score=2.49 β TRIGGER (periodic, interval=10) β
|
| ... |
| ``` |
|
|
| With `trigger_mode: "mixed"`: |
| ```python |
| Generation 0: score=2.40 β SKIP |
| Generation 5: score=2.40 β TRIGGER (plateau detected!) β
|
| Generation 10: score=2.45 β TRIGGER (periodic) β
|
| ... |
| ``` |
|
|
| ## π― What This Achieves |
|
|
| ### Before (without service): |
| ```python |
| # In ShinkaEvolve |
| for gen in range(num_generations): |
| score = evaluate(gen) |
| # No auxiliary metrics |
| # No intelligent analysis |
| ``` |
|
|
| ### After (with service): |
| ```python |
| # In ShinkaEvolve (minimal change) |
| for gen in range(num_generations): |
| score = evaluate(gen) |
| notify_service(gen, score) # β Just one line! |
| |
| # Service independently: |
| # - Decides when to analyze |
| # - Runs EV2 agent |
| # - Creates auxiliary metrics |
| # - Accumulates insights |
| ``` |
|
|
| ## β
Success Criteria |
|
|
| You've successfully tested Step 1 if: |
|
|
| 1. β
Service starts without errors |
| 2. β
Service responds to notifications |
| 3. β
Service correctly skips some generations (based on strategy) |
| 4. β
Service triggers agent at the right times |
| 5. β
Agent creates/updates EVAL_AGENTS.md and auxiliary_metrics.py |
| 6. β
Service state persists (check service_state.json) |
| |
| ## π Troubleshooting |
| |
| ### Service won't start |
| |
| **Error**: `ModuleNotFoundError: No module named 'fastapi'` |
| **Fix**: `pip install fastapi uvicorn pyyaml` |
| |
| ### Service starts but test fails |
| |
| **Error**: `Cannot connect to service` |
| **Fix**: Check if service is running on port 8765. Try: `curl http://localhost:8765/` |
| |
| ### Agent doesn't trigger |
| |
| **Check**: |
| 1. Is `agent_enabled: true` in config? |
| 2. Are you sending enough generations? (interval=10 means trigger at gen 10, 20, 30...) |
| 3. Check service logs in Terminal 1 |
|
|
| ### Agent fails to run |
|
|
| **Error in service logs**: `Primary evaluator not found` |
| **Fix**: Check `primary_evaluator` path in config is correct |
|
|
| ## π Next Steps |
|
|
| After Step 1 works: |
|
|
| **Step 2**: Add intelligent decision-making |
| - More sophisticated trigger strategies |
| - Plateau detection improvements |
| - Alert levels |
|
|
| **Step 3**: Add persistent memory |
| - SQLite database for history |
| - Metric tracking |
| - Correlation analysis |
|
|
| **Step 4**: Add MetricUnit management |
| - Object-oriented metrics |
| - Lifecycle management |
| - Validation system |
|
|
| ## π Notes |
|
|
| - The service is **stateless regarding ShinkaEvolve** - it doesn't block or affect the evolution process |
| - If the service crashes, ShinkaEvolve continues normally (fire-and-forget) |
| - Service state is saved to disk, so it survives restarts |
| - All agent logic from `ev2.py` is preserved and unchanged |
|
|
| --- |
|
|
| Ready to test? Start the service and run the tests! π |
|
|