Eval Protocol FrozenLake Example
This example shows how to use Eval Protocol's FrozenLake environment from within rLLM using the generic EvalProtocolWorkflow.
For a conceptual overview of how this integration works and how it generalizes to other benchmarks, see the core-concepts page on Eval Protocol Integration.
Quick Start
Prepare FrozenLake dataset
From the project root:
cd examples/eval_protocol
python prepare_frozen_lake_data.py
This script builds and registers the frozen_lake_eval_protocol train/test splits in the rLLM DatasetRegistry.
Run FrozenLake workflow (inference)
Once your Fireworks API credentials are configured, you can run a small batch of FrozenLake tasks through Eval Protocol and rLLM:
python run_frozen_lake_flow.py
This will:
- Load the
frozen_lake_eval_protocoltest split. - Use
EvalProtocolWorkflow(withenv_path="eval_protocol.benchmarks.test_frozen_lake") to run rollouts via Eval Protocol. - Print per-task rewards/accuracy and save results to
logs/frozen_lake_results.json.
Train an RL agent
To train an agent against the same Eval Protocol FrozenLake environment:
bash train_frozen_lake_flow.sh
This uses EvalProtocolWorkflow inside AgentTrainer (via Hydra configs) to:
- Generate rollouts using Eval Protocol’s rollout processor and MCP server.
- Compute rewards via the Eval Protocol evaluation function.
- Optimize the underlying model with PPO/GRPO.
You can edit train_frozen_lake_flow.sh to customize model path, Fireworks deployment, and training hyperparameters.
Code Reference
Data preparation
Script that builds and registers the FrozenLake Eval Protocol dataset:
--8<-- "examples/eval_protocol/prepare_frozen_lake_data.py"
Workflow runner
Main script for running the FrozenLake Eval Protocol workflow through rLLM:
--8<-- "examples/eval_protocol/run_frozen_lake_flow.py"
Training script
Agent training implementation using EvalProtocolWorkflow and AgentTrainer:
--8<-- "examples/eval_protocol/train_frozen_lake_flow.py"