Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

App Files Files Community

gensearcher-firered / vendor /rllm /examples /eval_protocol /README.md

JSCPPProgrammer

Initial: GenSearcher workflow + FireRed /generate adapter + Gradio

80b7188 verified 2 months ago

preview code

raw

history blame contribute delete

2.45 kB

Eval Protocol FrozenLake Example

This example shows how to use Eval Protocol's FrozenLake environment from within rLLM using the generic EvalProtocolWorkflow.

For a conceptual overview of how this integration works and how it generalizes to other benchmarks, see the core-concepts page on Eval Protocol Integration.

Quick Start

Prepare FrozenLake dataset

From the project root:

cd examples/eval_protocol
python prepare_frozen_lake_data.py

This script builds and registers the frozen_lake_eval_protocol train/test splits in the rLLM DatasetRegistry.

Run FrozenLake workflow (inference)

Once your Fireworks API credentials are configured, you can run a small batch of FrozenLake tasks through Eval Protocol and rLLM:

python run_frozen_lake_flow.py

This will:

Load the frozen_lake_eval_protocol test split.
Use EvalProtocolWorkflow (with env_path="eval_protocol.benchmarks.test_frozen_lake") to run rollouts via Eval Protocol.
Print per-task rewards/accuracy and save results to logs/frozen_lake_results.json.

Train an RL agent

To train an agent against the same Eval Protocol FrozenLake environment:

bash train_frozen_lake_flow.sh

This uses EvalProtocolWorkflow inside AgentTrainer (via Hydra configs) to:

Generate rollouts using Eval Protocol’s rollout processor and MCP server.
Compute rewards via the Eval Protocol evaluation function.
Optimize the underlying model with PPO/GRPO.

You can edit train_frozen_lake_flow.sh to customize model path, Fireworks deployment, and training hyperparameters.

Code Reference

Data preparation

Script that builds and registers the FrozenLake Eval Protocol dataset:

--8<-- "examples/eval_protocol/prepare_frozen_lake_data.py"

Workflow runner

Main script for running the FrozenLake Eval Protocol workflow through rLLM:

--8<-- "examples/eval_protocol/run_frozen_lake_flow.py"

Training script

Agent training implementation using EvalProtocolWorkflow and AgentTrainer:

--8<-- "examples/eval_protocol/train_frozen_lake_flow.py"