JSCPPProgrammer's picture
Initial: GenSearcher workflow + FireRed /generate adapter + Gradio
80b7188 verified
# Eval Protocol FrozenLake Example
This example shows how to use **Eval Protocol**'s FrozenLake environment from within **rLLM** using the generic `EvalProtocolWorkflow`.
For a conceptual overview of how this integration works and how it generalizes to other benchmarks, see the core-concepts page on [Eval Protocol Integration](../../docs/core-concepts/eval-protocol.md).
---
## Quick Start
### Prepare FrozenLake dataset
From the project root:
```bash
cd examples/eval_protocol
python prepare_frozen_lake_data.py
```
This script builds and registers the `frozen_lake_eval_protocol` train/test splits in the rLLM `DatasetRegistry`.
### Run FrozenLake workflow (inference)
Once your Fireworks API credentials are configured, you can run a small batch of FrozenLake tasks through Eval Protocol and rLLM:
```bash
python run_frozen_lake_flow.py
```
This will:
- Load the `frozen_lake_eval_protocol` test split.
- Use `EvalProtocolWorkflow` (with `env_path="eval_protocol.benchmarks.test_frozen_lake"`) to run rollouts via Eval Protocol.
- Print per-task rewards/accuracy and save results to `logs/frozen_lake_results.json`.
### Train an RL agent
To train an agent against the same Eval Protocol FrozenLake environment:
```bash
bash train_frozen_lake_flow.sh
```
This uses `EvalProtocolWorkflow` inside `AgentTrainer` (via Hydra configs) to:
- Generate rollouts using Eval Protocol’s rollout processor and MCP server.
- Compute rewards via the Eval Protocol evaluation function.
- Optimize the underlying model with PPO/GRPO.
You can edit `train_frozen_lake_flow.sh` to customize model path, Fireworks deployment, and training hyperparameters.
---
## Code Reference
### Data preparation
Script that builds and registers the FrozenLake Eval Protocol dataset:
```python title="examples/eval_protocol/prepare_frozen_lake_data.py"
--8<-- "examples/eval_protocol/prepare_frozen_lake_data.py"
```
### Workflow runner
Main script for running the FrozenLake Eval Protocol workflow through rLLM:
```python title="examples/eval_protocol/run_frozen_lake_flow.py"
--8<-- "examples/eval_protocol/run_frozen_lake_flow.py"
```
### Training script
Agent training implementation using `EvalProtocolWorkflow` and `AgentTrainer`:
```python title="examples/eval_protocol/train_frozen_lake_flow.py"
--8<-- "examples/eval_protocol/train_frozen_lake_flow.py"
```