# Step 1: Testing Guide for EV2 Service ## 🎯 What We Built A **minimal HTTP service wrapper** around `ev2.py` that: - ✅ Receives generation completion notifications - ✅ Autonomously decides when to trigger EV2 agent - ✅ Maintains persistent state across generations - ✅ Requires minimal changes to ShinkaEvolve ## 📋 File Overview ``` eval_agent/ ├── ev2_service.py # The HTTP service (NEW) ├── ev2_service_config.yaml # Configuration file (NEW) ├── test_ev2_service.py # Test script (NEW) ├── ev2.py # Original agent logic (UNCHANGED) └── ev2_prompt.j2 # Agent prompt (UNCHANGED) ``` ## 🚀 Step-by-Step Testing ### Step 1: Install Dependencies ```bash cd /home/tengxiao/pj/ShinkaEvolve source venv/bin/activate # Install FastAPI and Uvicorn pip install fastapi uvicorn pyyaml ``` ### Step 2: Configure the Service Edit `eval_agent/ev2_service_config.yaml` if needed: ```yaml experiment: results_dir: "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" primary_evaluator: "examples/circle_packing/evaluate_ori.py" strategy: trigger_mode: "periodic" # Options: always, periodic, plateau, mixed trigger_interval: 10 # Run agent every 10 generations ``` **Trigger Modes:** - `always`: Run agent every generation (for testing) - `periodic`: Run every N generations - `plateau`: Run when score plateaus - `mixed`: Run on periodic OR plateau (whichever comes first) ### Step 3: Start the Service **Terminal 1** (Service): ```bash cd /home/tengxiao/pj/ShinkaEvolve source venv/bin/activate # Start the service python eval_agent/ev2_service.py --config eval_agent/ev2_service_config.yaml ``` Expected output: ``` INFO: Started server process [12345] INFO: Waiting for application startup. 2026-02-02 15:30:00 - __main__ - INFO - 🚀 Starting EV2 Evaluation Service... 2026-02-02 15:30:00 - __main__ - INFO - ✅ Service started 2026-02-02 15:30:00 - __main__ - INFO - Experiment: circle_packing_NO_vision 2026-02-02 15:30:00 - __main__ - INFO - Trigger mode: periodic 2026-02-02 15:30:00 - __main__ - INFO - Trigger interval: 10 INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit) ``` ### Step 4: Test the Service **Terminal 2** (Test): ```bash cd /home/tengxiao/pj/ShinkaEvolve source venv/bin/activate # Test 1: Check service status python eval_agent/test_ev2_service.py \ --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ --test-mode status ``` Expected output: ``` 🔍 Testing service status... ✅ Service is running! Uptime: 12.3s Trigger mode: periodic Trigger interval: 10 ``` ```bash # Test 2: Simulate evolution (25 generations) python eval_agent/test_ev2_service.py \ --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ --test-mode simulate \ --num-gens 25 ``` Expected output: ``` 🧬 Simulating evolution with 25 generations... ====================================================================== 📤 Sending notification: gen=0, score=2.4000 Status: skipped Agent triggered: False Reason: Not yet (last trigger at gen -1) Processing time: 5.2ms 📤 Sending notification: gen=1, score=2.4050 Status: skipped Agent triggered: False Reason: Not yet (last trigger at gen -1) Processing time: 3.1ms ... 📤 Sending notification: gen=10, score=2.4500 Status: success Agent triggered: True Reason: Periodic trigger (interval=10) Processing time: 15234.5ms Insights: 3 found ... 📤 Sending notification: gen=20, score=2.4950 Status: success Agent triggered: True Reason: Periodic trigger (interval=10) Processing time: 12456.7ms Insights: 3 found ====================================================================== ✅ Simulation complete! ``` ### Step 5: Check Results The service creates/updates: ``` examples/circle_packing/results/.../ └── eval_agent_memory/ ├── EVAL_AGENTS.md # Updated by agent ├── auxiliary_metrics.py # Created by agent └── service_state.json # Service state (NEW) ``` Check service state: ```bash cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json ``` ### Step 6: Test Manual Trigger (Optional) ```bash # Manually trigger agent for generation 5 python eval_agent/test_ev2_service.py \ --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \ --test-mode manual \ --generation 5 ``` ## 🔌 API Documentation The service provides these endpoints: ### 1. Generation Notification (Main) ```bash curl -X POST http://localhost:8765/api/v1/notify/generation_complete \ -H "Content-Type: application/json" \ -d '{ "generation": 42, "results_dir": "/path/to/results", "primary_score": 2.5407 }' ``` Response: ```json { "status": "success", "message": "Periodic trigger (interval=10)", "generation": 42, "agent_triggered": true, "trigger_reason": "Periodic trigger (interval=10)", "insights": ["..."], "auxiliary_metrics": {...}, "processing_time_ms": 15234.5 } ``` ### 2. Service Status ```bash curl http://localhost:8765/api/v1/status ``` ### 3. Manual Trigger ```bash curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=10" ``` ### 4. Interactive Docs Open in browser: http://localhost:8765/docs ## 🔧 Integration with ShinkaEvolve To integrate with ShinkaEvolve, add this to `EvolutionRunner`: ```python # shinka/core/runner.py class EvolutionRunner: def __init__(self, config: EvolutionConfig): self.config = config # Initialize eval service client (optional) self.eval_service_url = config.eval_service_url if hasattr(config, 'eval_service_url') else None def _evaluate_generation(self, generation: int, code_path: str, results_dir: str): # Run normal evaluation (unchanged) results, score = self.scheduler.run(code_path, results_dir) # Notify eval service (NEW, non-blocking) if self.eval_service_url: try: import requests requests.post( f"{self.eval_service_url}/api/v1/notify/generation_complete", json={ "generation": generation, "results_dir": results_dir, "primary_score": score }, timeout=1 # Short timeout, fire-and-forget ) except Exception as e: self.logger.warning(f"Eval service notification failed: {e}") # Continue regardless return results, score ``` **Changes required**: ~10 lines of code! ## 📊 Service Decision Logic The service decides autonomously when to trigger the agent: ```python Generation 0: score=2.40 → SKIP (not yet, interval=10) Generation 1: score=2.41 → SKIP ... Generation 10: score=2.45 → TRIGGER (periodic, interval=10) ✅ Generation 11: score=2.46 → SKIP ... Generation 20: score=2.49 → TRIGGER (periodic, interval=10) ✅ ... ``` With `trigger_mode: "mixed"`: ```python Generation 0: score=2.40 → SKIP Generation 5: score=2.40 → TRIGGER (plateau detected!) ✅ Generation 10: score=2.45 → TRIGGER (periodic) ✅ ... ``` ## 🎯 What This Achieves ### Before (without service): ```python # In ShinkaEvolve for gen in range(num_generations): score = evaluate(gen) # No auxiliary metrics # No intelligent analysis ``` ### After (with service): ```python # In ShinkaEvolve (minimal change) for gen in range(num_generations): score = evaluate(gen) notify_service(gen, score) # ← Just one line! # Service independently: # - Decides when to analyze # - Runs EV2 agent # - Creates auxiliary metrics # - Accumulates insights ``` ## ✅ Success Criteria You've successfully tested Step 1 if: 1. ✅ Service starts without errors 2. ✅ Service responds to notifications 3. ✅ Service correctly skips some generations (based on strategy) 4. ✅ Service triggers agent at the right times 5. ✅ Agent creates/updates EVAL_AGENTS.md and auxiliary_metrics.py 6. ✅ Service state persists (check service_state.json) ## 🐛 Troubleshooting ### Service won't start **Error**: `ModuleNotFoundError: No module named 'fastapi'` **Fix**: `pip install fastapi uvicorn pyyaml` ### Service starts but test fails **Error**: `Cannot connect to service` **Fix**: Check if service is running on port 8765. Try: `curl http://localhost:8765/` ### Agent doesn't trigger **Check**: 1. Is `agent_enabled: true` in config? 2. Are you sending enough generations? (interval=10 means trigger at gen 10, 20, 30...) 3. Check service logs in Terminal 1 ### Agent fails to run **Error in service logs**: `Primary evaluator not found` **Fix**: Check `primary_evaluator` path in config is correct ## 🚀 Next Steps After Step 1 works: **Step 2**: Add intelligent decision-making - More sophisticated trigger strategies - Plateau detection improvements - Alert levels **Step 3**: Add persistent memory - SQLite database for history - Metric tracking - Correlation analysis **Step 4**: Add MetricUnit management - Object-oriented metrics - Lifecycle management - Validation system ## 📝 Notes - The service is **stateless regarding ShinkaEvolve** - it doesn't block or affect the evolution process - If the service crashes, ShinkaEvolve continues normally (fire-and-forget) - Service state is saved to disk, so it survives restarts - All agent logic from `ev2.py` is preserved and unchanged --- Ready to test? Start the service and run the tests! 🚀