|
**`evaluate.py` - Evaluation Script**
```python
from shinka.core import run_shinka_eval
def main(program_path: str,
results_dir: str):
metrics, correct, err = run_shinka_eval(
program_path=program_path,
results_dir=results_dir,
experiment_fn_name="run_experiment",
num_runs=3, # Multi-evals to aggreg.
get_experiment_kwargs=get_kwargs,
aggregate_metrics_fn=aggregate_fn,
validate_fn=validate_fn, # Optional
)
def get_kwargs(run_idx: int) -> dict:
return {"param1": "value", "param2": 42}
def aggregate_fn(results: list) -> dict:
score = results[0]
text = results[1]
return {
"combined_score": float(score),
"public": {...}, # shinka-visible
"private": {...}, # shinka-invisible
"extra_data": {...}, # store as pkl
"text_feedback": text, # str fb
}
if __name__ == "__main__":
# argparse program path & dir
main(program_path, results_dir)
```
|
**`initial.py` - Starting Solution**
```python
# EVOLVE-BLOCK-START
def advanced_algo():
# This will be evolved
return solution
# EVOLVE-BLOCK-END
def run_experiment(**kwargs):
"""Main called by evaluator"""
result = solve_problem(kwargs)
return result
def solve_problem(params):
solution = advanced_algo()
return solution
```
**Key Points:**
- Eval name matches `experiment_fn_name`
- Use `EVOLVE-BLOCK-START` and `EVOLVE-BLOCK-END` to mark evolution sections
- Return format matches validation expectations
- Dependencies must be available in env
- Results can be unpacked for metrics
- Auto-stores several results in `results_dir`
- Can add text feedback in `shinka` loop
- Higher `combined_score` values indicate better performance (maximization)
|