JSCPPProgrammer's picture
Initial: GenSearcher workflow + FireRed /generate adapter + Gradio
80b7188 verified

Eval Protocol FrozenLake Example

This example shows how to use Eval Protocol's FrozenLake environment from within rLLM using the generic EvalProtocolWorkflow.

For a conceptual overview of how this integration works and how it generalizes to other benchmarks, see the core-concepts page on Eval Protocol Integration.


Quick Start

Prepare FrozenLake dataset

From the project root:

cd examples/eval_protocol
python prepare_frozen_lake_data.py

This script builds and registers the frozen_lake_eval_protocol train/test splits in the rLLM DatasetRegistry.

Run FrozenLake workflow (inference)

Once your Fireworks API credentials are configured, you can run a small batch of FrozenLake tasks through Eval Protocol and rLLM:

python run_frozen_lake_flow.py

This will:

  • Load the frozen_lake_eval_protocol test split.
  • Use EvalProtocolWorkflow (with env_path="eval_protocol.benchmarks.test_frozen_lake") to run rollouts via Eval Protocol.
  • Print per-task rewards/accuracy and save results to logs/frozen_lake_results.json.

Train an RL agent

To train an agent against the same Eval Protocol FrozenLake environment:

bash train_frozen_lake_flow.sh

This uses EvalProtocolWorkflow inside AgentTrainer (via Hydra configs) to:

  • Generate rollouts using Eval Protocol’s rollout processor and MCP server.
  • Compute rewards via the Eval Protocol evaluation function.
  • Optimize the underlying model with PPO/GRPO.

You can edit train_frozen_lake_flow.sh to customize model path, Fireworks deployment, and training hyperparameters.


Code Reference

Data preparation

Script that builds and registers the FrozenLake Eval Protocol dataset:

--8<-- "examples/eval_protocol/prepare_frozen_lake_data.py"

Workflow runner

Main script for running the FrozenLake Eval Protocol workflow through rLLM:

--8<-- "examples/eval_protocol/run_frozen_lake_flow.py"

Training script

Agent training implementation using EvalProtocolWorkflow and AgentTrainer:

--8<-- "examples/eval_protocol/train_frozen_lake_flow.py"