Spaces:

JSCPPProgrammer
/

gensearcher-firered

Paused

App Files Files Community

gensearcher-firered / vendor /rllm /examples /eval_protocol /README.md

JSCPPProgrammer

Initial: GenSearcher workflow + FireRed /generate adapter + Gradio

80b7188 verified 2 months ago

preview code

raw

history blame contribute delete

2.45 kB

	# Eval Protocol FrozenLake Example

	This example shows how to use Eval Protocol's FrozenLake environment from within rLLM using the generic `EvalProtocolWorkflow`.

	For a conceptual overview of how this integration works and how it generalizes to other benchmarks, see the core-concepts page on [Eval Protocol Integration](../../docs/core-concepts/eval-protocol.md).

	---

	## Quick Start

	### Prepare FrozenLake dataset

	From the project root:

	```bash
	cd examples/eval_protocol
	python prepare_frozen_lake_data.py
	```

	This script builds and registers the `frozen_lake_eval_protocol` train/test splits in the rLLM `DatasetRegistry`.

	### Run FrozenLake workflow (inference)

	Once your Fireworks API credentials are configured, you can run a small batch of FrozenLake tasks through Eval Protocol and rLLM:

	```bash
	python run_frozen_lake_flow.py
	```

	This will:

	- Load the `frozen_lake_eval_protocol` test split.
	- Use `EvalProtocolWorkflow` (with `env_path="eval_protocol.benchmarks.test_frozen_lake"`) to run rollouts via Eval Protocol.
	- Print per-task rewards/accuracy and save results to `logs/frozen_lake_results.json`.

	### Train an RL agent

	To train an agent against the same Eval Protocol FrozenLake environment:

	```bash
	bash train_frozen_lake_flow.sh
	```

	This uses `EvalProtocolWorkflow` inside `AgentTrainer` (via Hydra configs) to:

	- Generate rollouts using Eval Protocol’s rollout processor and MCP server.
	- Compute rewards via the Eval Protocol evaluation function.
	- Optimize the underlying model with PPO/GRPO.

	You can edit `train_frozen_lake_flow.sh` to customize model path, Fireworks deployment, and training hyperparameters.

	---

	## Code Reference

	### Data preparation

	Script that builds and registers the FrozenLake Eval Protocol dataset:

	```python title="examples/eval_protocol/prepare_frozen_lake_data.py"
	--8<-- "examples/eval_protocol/prepare_frozen_lake_data.py"
	```

	### Workflow runner

	Main script for running the FrozenLake Eval Protocol workflow through rLLM:

	```python title="examples/eval_protocol/run_frozen_lake_flow.py"
	--8<-- "examples/eval_protocol/run_frozen_lake_flow.py"
	```

	### Training script

	Agent training implementation using `EvalProtocolWorkflow` and `AgentTrainer`:

	```python title="examples/eval_protocol/train_frozen_lake_flow.py"
	--8<-- "examples/eval_protocol/train_frozen_lake_flow.py"
	```