Step 1: Testing Guide for EV2 Service
π― What We Built
A minimal HTTP service wrapper around ev2.py that:
- β Receives generation completion notifications
- β Autonomously decides when to trigger EV2 agent
- β Maintains persistent state across generations
- β Requires minimal changes to ShinkaEvolve
π File Overview
eval_agent/
βββ ev2_service.py # The HTTP service (NEW)
βββ ev2_service_config.yaml # Configuration file (NEW)
βββ test_ev2_service.py # Test script (NEW)
βββ ev2.py # Original agent logic (UNCHANGED)
βββ ev2_prompt.j2 # Agent prompt (UNCHANGED)
π Step-by-Step Testing
Step 1: Install Dependencies
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate
# Install FastAPI and Uvicorn
pip install fastapi uvicorn pyyaml
Step 2: Configure the Service
Edit eval_agent/ev2_service_config.yaml if needed:
experiment:
results_dir: "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215"
primary_evaluator: "examples/circle_packing/evaluate_ori.py"
strategy:
trigger_mode: "periodic" # Options: always, periodic, plateau, mixed
trigger_interval: 10 # Run agent every 10 generations
Trigger Modes:
always: Run agent every generation (for testing)periodic: Run every N generationsplateau: Run when score plateausmixed: Run on periodic OR plateau (whichever comes first)
Step 3: Start the Service
Terminal 1 (Service):
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate
# Start the service
python eval_agent/ev2_service.py --config eval_agent/ev2_service_config.yaml
Expected output:
INFO: Started server process [12345]
INFO: Waiting for application startup.
2026-02-02 15:30:00 - __main__ - INFO - π Starting EV2 Evaluation Service...
2026-02-02 15:30:00 - __main__ - INFO - β
Service started
2026-02-02 15:30:00 - __main__ - INFO - Experiment: circle_packing_NO_vision
2026-02-02 15:30:00 - __main__ - INFO - Trigger mode: periodic
2026-02-02 15:30:00 - __main__ - INFO - Trigger interval: 10
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)
Step 4: Test the Service
Terminal 2 (Test):
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate
# Test 1: Check service status
python eval_agent/test_ev2_service.py \
--results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
--test-mode status
Expected output:
π Testing service status...
β
Service is running!
Uptime: 12.3s
Trigger mode: periodic
Trigger interval: 10
# Test 2: Simulate evolution (25 generations)
python eval_agent/test_ev2_service.py \
--results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
--test-mode simulate \
--num-gens 25
Expected output: ``` 𧬠Simulating evolution with 25 generations...
π€ Sending notification: gen=0, score=2.4000 Status: skipped Agent triggered: False Reason: Not yet (last trigger at gen -1) Processing time: 5.2ms
π€ Sending notification: gen=1, score=2.4050 Status: skipped Agent triggered: False Reason: Not yet (last trigger at gen -1) Processing time: 3.1ms
...
π€ Sending notification: gen=10, score=2.4500 Status: success Agent triggered: True Reason: Periodic trigger (interval=10) Processing time: 15234.5ms Insights: 3 found
...
π€ Sending notification: gen=20, score=2.4950 Status: success Agent triggered: True Reason: Periodic trigger (interval=10) Processing time: 12456.7ms Insights: 3 found
====================================================================== β Simulation complete!
### Step 5: Check Results
The service creates/updates:
examples/circle_packing/results/.../ βββ eval_agent_memory/ βββ EVAL_AGENTS.md # Updated by agent βββ auxiliary_metrics.py # Created by agent βββ service_state.json # Service state (NEW)
Check service state:
```bash
cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json
Step 6: Test Manual Trigger (Optional)
# Manually trigger agent for generation 5
python eval_agent/test_ev2_service.py \
--results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
--test-mode manual \
--generation 5
π API Documentation
The service provides these endpoints:
1. Generation Notification (Main)
curl -X POST http://localhost:8765/api/v1/notify/generation_complete \
-H "Content-Type: application/json" \
-d '{
"generation": 42,
"results_dir": "/path/to/results",
"primary_score": 2.5407
}'
Response:
{
"status": "success",
"message": "Periodic trigger (interval=10)",
"generation": 42,
"agent_triggered": true,
"trigger_reason": "Periodic trigger (interval=10)",
"insights": ["..."],
"auxiliary_metrics": {...},
"processing_time_ms": 15234.5
}
2. Service Status
curl http://localhost:8765/api/v1/status
3. Manual Trigger
curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=10"
4. Interactive Docs
Open in browser: http://localhost:8765/docs
π§ Integration with ShinkaEvolve
To integrate with ShinkaEvolve, add this to EvolutionRunner:
# shinka/core/runner.py
class EvolutionRunner:
def __init__(self, config: EvolutionConfig):
self.config = config
# Initialize eval service client (optional)
self.eval_service_url = config.eval_service_url if hasattr(config, 'eval_service_url') else None
def _evaluate_generation(self, generation: int, code_path: str, results_dir: str):
# Run normal evaluation (unchanged)
results, score = self.scheduler.run(code_path, results_dir)
# Notify eval service (NEW, non-blocking)
if self.eval_service_url:
try:
import requests
requests.post(
f"{self.eval_service_url}/api/v1/notify/generation_complete",
json={
"generation": generation,
"results_dir": results_dir,
"primary_score": score
},
timeout=1 # Short timeout, fire-and-forget
)
except Exception as e:
self.logger.warning(f"Eval service notification failed: {e}")
# Continue regardless
return results, score
Changes required: ~10 lines of code!
π Service Decision Logic
The service decides autonomously when to trigger the agent:
Generation 0: score=2.40 β SKIP (not yet, interval=10)
Generation 1: score=2.41 β SKIP
...
Generation 10: score=2.45 β TRIGGER (periodic, interval=10) β
Generation 11: score=2.46 β SKIP
...
Generation 20: score=2.49 β TRIGGER (periodic, interval=10) β
...
With trigger_mode: "mixed":
Generation 0: score=2.40 β SKIP
Generation 5: score=2.40 β TRIGGER (plateau detected!) β
Generation 10: score=2.45 β TRIGGER (periodic) β
...
π― What This Achieves
Before (without service):
# In ShinkaEvolve
for gen in range(num_generations):
score = evaluate(gen)
# No auxiliary metrics
# No intelligent analysis
After (with service):
# In ShinkaEvolve (minimal change)
for gen in range(num_generations):
score = evaluate(gen)
notify_service(gen, score) # β Just one line!
# Service independently:
# - Decides when to analyze
# - Runs EV2 agent
# - Creates auxiliary metrics
# - Accumulates insights
β Success Criteria
You've successfully tested Step 1 if:
- β Service starts without errors
- β Service responds to notifications
- β Service correctly skips some generations (based on strategy)
- β Service triggers agent at the right times
- β Agent creates/updates EVAL_AGENTS.md and auxiliary_metrics.py
- β Service state persists (check service_state.json)
π Troubleshooting
Service won't start
Error: ModuleNotFoundError: No module named 'fastapi'
Fix: pip install fastapi uvicorn pyyaml
Service starts but test fails
Error: Cannot connect to service
Fix: Check if service is running on port 8765. Try: curl http://localhost:8765/
Agent doesn't trigger
Check:
- Is
agent_enabled: truein config? - Are you sending enough generations? (interval=10 means trigger at gen 10, 20, 30...)
- Check service logs in Terminal 1
Agent fails to run
Error in service logs: Primary evaluator not found
Fix: Check primary_evaluator path in config is correct
π Next Steps
After Step 1 works:
Step 2: Add intelligent decision-making
- More sophisticated trigger strategies
- Plateau detection improvements
- Alert levels
Step 3: Add persistent memory
- SQLite database for history
- Metric tracking
- Correlation analysis
Step 4: Add MetricUnit management
- Object-oriented metrics
- Lifecycle management
- Validation system
π Notes
- The service is stateless regarding ShinkaEvolve - it doesn't block or affect the evolution process
- If the service crashes, ShinkaEvolve continues normally (fire-and-forget)
- Service state is saved to disk, so it survives restarts
- All agent logic from
ev2.pyis preserved and unchanged
Ready to test? Start the service and run the tests! π