File size: 10,023 Bytes

3f6526a

# Step 1: Testing Guide for EV2 Service

## 🎯 What We Built

A **minimal HTTP service wrapper** around `ev2.py` that:
- ✅ Receives generation completion notifications
- ✅ Autonomously decides when to trigger EV2 agent
- ✅ Maintains persistent state across generations
- ✅ Requires minimal changes to ShinkaEvolve

## 📋 File Overview

```
eval_agent/
├── ev2_service.py              # The HTTP service (NEW)
├── ev2_service_config.yaml     # Configuration file (NEW)
├── test_ev2_service.py         # Test script (NEW)
├── ev2.py                      # Original agent logic (UNCHANGED)
└── ev2_prompt.j2               # Agent prompt (UNCHANGED)
```

## 🚀 Step-by-Step Testing

### Step 1: Install Dependencies

```bash
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate

# Install FastAPI and Uvicorn
pip install fastapi uvicorn pyyaml
```

### Step 2: Configure the Service

Edit `eval_agent/ev2_service_config.yaml` if needed:

```yaml
experiment:
  results_dir: "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215"
  primary_evaluator: "examples/circle_packing/evaluate_ori.py"

strategy:
  trigger_mode: "periodic"  # Options: always, periodic, plateau, mixed
  trigger_interval: 10       # Run agent every 10 generations
```

**Trigger Modes:**
- `always`: Run agent every generation (for testing)
- `periodic`: Run every N generations
- `plateau`: Run when score plateaus
- `mixed`: Run on periodic OR plateau (whichever comes first)

### Step 3: Start the Service

**Terminal 1** (Service):

```bash
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate

# Start the service
python eval_agent/ev2_service.py --config eval_agent/ev2_service_config.yaml
```

Expected output:
```
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
2026-02-02 15:30:00 - __main__ - INFO - 🚀 Starting EV2 Evaluation Service...
2026-02-02 15:30:00 - __main__ - INFO - ✅ Service started
2026-02-02 15:30:00 - __main__ - INFO -    Experiment: circle_packing_NO_vision
2026-02-02 15:30:00 - __main__ - INFO -    Trigger mode: periodic
2026-02-02 15:30:00 - __main__ - INFO -    Trigger interval: 10
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)
```

### Step 4: Test the Service

**Terminal 2** (Test):

```bash
cd /home/tengxiao/pj/ShinkaEvolve
source venv/bin/activate

# Test 1: Check service status
python eval_agent/test_ev2_service.py \
    --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
    --test-mode status
```

Expected output:
```
🔍 Testing service status...
✅ Service is running!
   Uptime: 12.3s
   Trigger mode: periodic
   Trigger interval: 10
```

```bash
# Test 2: Simulate evolution (25 generations)
python eval_agent/test_ev2_service.py \
    --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
    --test-mode simulate \
    --num-gens 25
```

Expected output:
```
🧬 Simulating evolution with 25 generations...
======================================================================

📤 Sending notification: gen=0, score=2.4000
   Status: skipped
   Agent triggered: False
   Reason: Not yet (last trigger at gen -1)
   Processing time: 5.2ms

📤 Sending notification: gen=1, score=2.4050
   Status: skipped
   Agent triggered: False
   Reason: Not yet (last trigger at gen -1)
   Processing time: 3.1ms

...

📤 Sending notification: gen=10, score=2.4500
   Status: success
   Agent triggered: True
   Reason: Periodic trigger (interval=10)
   Processing time: 15234.5ms
   Insights: 3 found

...

📤 Sending notification: gen=20, score=2.4950
   Status: success
   Agent triggered: True
   Reason: Periodic trigger (interval=10)
   Processing time: 12456.7ms
   Insights: 3 found

======================================================================
✅ Simulation complete!
```

### Step 5: Check Results

The service creates/updates:

```
examples/circle_packing/results/.../
└── eval_agent_memory/
    ├── EVAL_AGENTS.md              # Updated by agent
    ├── auxiliary_metrics.py        # Created by agent
    └── service_state.json          # Service state (NEW)
```

Check service state:
```bash
cat examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215/eval_agent_memory/service_state.json
```

### Step 6: Test Manual Trigger (Optional)

```bash
# Manually trigger agent for generation 5
python eval_agent/test_ev2_service.py \
    --results-dir "examples/circle_packing/results/results_circle_packing_NO_vision_WITH_refined_aux_20260118_205215" \
    --test-mode manual \
    --generation 5
```

## 🔌 API Documentation

The service provides these endpoints:

### 1. Generation Notification (Main)

```bash
curl -X POST http://localhost:8765/api/v1/notify/generation_complete \
  -H "Content-Type: application/json" \
  -d '{
    "generation": 42,
    "results_dir": "/path/to/results",
    "primary_score": 2.5407
  }'
```

Response:
```json
{
  "status": "success",
  "message": "Periodic trigger (interval=10)",
  "generation": 42,
  "agent_triggered": true,
  "trigger_reason": "Periodic trigger (interval=10)",
  "insights": ["..."],
  "auxiliary_metrics": {...},
  "processing_time_ms": 15234.5
}
```

### 2. Service Status

```bash
curl http://localhost:8765/api/v1/status
```

### 3. Manual Trigger

```bash
curl -X POST "http://localhost:8765/api/v1/trigger/manual?generation=10"
```

### 4. Interactive Docs

Open in browser: http://localhost:8765/docs

## 🔧 Integration with ShinkaEvolve

To integrate with ShinkaEvolve, add this to `EvolutionRunner`:

```python
# shinka/core/runner.py

class EvolutionRunner:
    def __init__(self, config: EvolutionConfig):
        self.config = config
        
        # Initialize eval service client (optional)
        self.eval_service_url = config.eval_service_url if hasattr(config, 'eval_service_url') else None
    
    def _evaluate_generation(self, generation: int, code_path: str, results_dir: str):
        # Run normal evaluation (unchanged)
        results, score = self.scheduler.run(code_path, results_dir)
        
        # Notify eval service (NEW, non-blocking)
        if self.eval_service_url:
            try:
                import requests
                requests.post(
                    f"{self.eval_service_url}/api/v1/notify/generation_complete",
                    json={
                        "generation": generation,
                        "results_dir": results_dir,
                        "primary_score": score
                    },
                    timeout=1  # Short timeout, fire-and-forget
                )
            except Exception as e:
                self.logger.warning(f"Eval service notification failed: {e}")
                # Continue regardless
        
        return results, score
```

**Changes required**: ~10 lines of code!

## 📊 Service Decision Logic

The service decides autonomously when to trigger the agent:

```python
Generation 0:  score=2.40  → SKIP (not yet, interval=10)
Generation 1:  score=2.41  → SKIP
...
Generation 10: score=2.45  → TRIGGER (periodic, interval=10) ✅
Generation 11: score=2.46  → SKIP
...
Generation 20: score=2.49  → TRIGGER (periodic, interval=10) ✅
...
```

With `trigger_mode: "mixed"`:
```python
Generation 0:  score=2.40  → SKIP
Generation 5:  score=2.40  → TRIGGER (plateau detected!) ✅
Generation 10: score=2.45  → TRIGGER (periodic) ✅
...
```

## 🎯 What This Achieves

### Before (without service):
```python
# In ShinkaEvolve
for gen in range(num_generations):
    score = evaluate(gen)
    # No auxiliary metrics
    # No intelligent analysis
```

### After (with service):
```python
# In ShinkaEvolve (minimal change)
for gen in range(num_generations):
    score = evaluate(gen)
    notify_service(gen, score)  # ← Just one line!
    
# Service independently:
# - Decides when to analyze
# - Runs EV2 agent
# - Creates auxiliary metrics
# - Accumulates insights
```

## ✅ Success Criteria

You've successfully tested Step 1 if:

1. ✅ Service starts without errors
2. ✅ Service responds to notifications
3. ✅ Service correctly skips some generations (based on strategy)
4. ✅ Service triggers agent at the right times
5. ✅ Agent creates/updates EVAL_AGENTS.md and auxiliary_metrics.py
6. ✅ Service state persists (check service_state.json)

## 🐛 Troubleshooting

### Service won't start

**Error**: `ModuleNotFoundError: No module named 'fastapi'`
**Fix**: `pip install fastapi uvicorn pyyaml`

### Service starts but test fails

**Error**: `Cannot connect to service`
**Fix**: Check if service is running on port 8765. Try: `curl http://localhost:8765/`

### Agent doesn't trigger

**Check**:
1. Is `agent_enabled: true` in config?
2. Are you sending enough generations? (interval=10 means trigger at gen 10, 20, 30...)
3. Check service logs in Terminal 1

### Agent fails to run

**Error in service logs**: `Primary evaluator not found`
**Fix**: Check `primary_evaluator` path in config is correct

## 🚀 Next Steps

After Step 1 works:

**Step 2**: Add intelligent decision-making
- More sophisticated trigger strategies
- Plateau detection improvements
- Alert levels

**Step 3**: Add persistent memory
- SQLite database for history
- Metric tracking
- Correlation analysis

**Step 4**: Add MetricUnit management
- Object-oriented metrics
- Lifecycle management
- Validation system

## 📝 Notes

- The service is **stateless regarding ShinkaEvolve** - it doesn't block or affect the evolution process
- If the service crashes, ShinkaEvolve continues normally (fire-and-forget)
- Service state is saved to disk, so it survives restarts
- All agent logic from `ev2.py` is preserved and unchanged

---

Ready to test? Start the service and run the tests! 🚀