Spaces:

AdithyaSK
/

rl-environments-guide

Running

App Files Files Community

Update app/src/content/chapters/framework-inventory.mdx

by sergiopaniego HF Staff - opened 24 days ago

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+12

-12

Files changed (1) hide show

app/src/content/chapters/framework-inventory.mdx +12 -12

app/src/content/chapters/framework-inventory.mdx CHANGED Viewed

@@ -16,18 +16,18 @@ We surveyed the space and picked these six to build the same environment across
 These are notable RL environment frameworks we evaluated but did not implement. They're excluded because they serve a different purpose or operate at a different level of abstraction.
-| Framework | Creator | Why excluded | Link |
-| --- | --- | --- | --- |
-| [**Atropos**](https://github.com/NousResearch/atropos) | Nous Research | Different paradigm, environments own inference and POST scored batches to a central API. Not compatible with TRL's turn-by-turn tool calling. | [GitHub](https://github.com/NousResearch/atropos) |
-| [**Harbor**](https://github.com/harbor-framework/harbor) | Stanford / Snorkel AI | Offline batch RL only, spins up Docker containers per trial, runs autonomous agents, collects trajectories. No live `environment_factory`. | [GitHub](https://github.com/harbor-framework/harbor) |
-| [**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) | Zhiyuan Zeng | Pure verifier library (445 tasks), `generate() → verify()` with no transport, no tools, no state. Not an environment framework, just problem oracles. | [GitHub](https://github.com/Zhiyuan-Zeng/RLVE) |
-| [**Reasoning Gym**](https://github.com/open-thought/reasoning-gym) | Open Thought | Procedural task generators + verifiers, same tier as RLVE. Stateless, no multi-turn, no tools. | [GitHub](https://github.com/open-thought/reasoning-gym) |
-| [**RAGEN**](https://github.com/ZihanWang314/RAGEN) | Zihan Wang | Full stack (env + StarPO + veRL), tightly coupled to its own training loop. Gym-compatible but not easily separable for TRL integration. | [GitHub](https://github.com/ZihanWang314/RAGEN) |
-| [**rLLM**](https://github.com/agentica-project/rllm) | Agentica | Decorator pattern, wraps existing agent code, intercepts LLM calls. No environment class to subclass. Different paradigm. | [GitHub](https://github.com/agentica-project/rllm) |
-| [**RL-Factory**](https://github.com/Simple-Efficient/RL-Factory) | Simple-Efficient | MCP config-based, any MCP server becomes an environment. Interesting but very early stage. | [GitHub](https://github.com/Simple-Efficient/RL-Factory) |
-| [**Open-Instruct**](https://github.com/allenai/open-instruct) | Allen AI | Full training framework with env hooks, environments are reward functions, not multi-turn interactive agents. | [GitHub](https://github.com/allenai/open-instruct) |
-| [**TextArena**](https://github.com/TextArena/TextArena) | Leon Guertler | Game-specific multi-agent environments, narrow domain, not a general framework. | [GitHub](https://github.com/TextArena/TextArena) |
-| [**LlamaGym**](https://github.com/KhoomeiK/LlamaGym) | KhoomeiK | Gymnasium wrapper for LLMs, early prototype, not actively maintained. | [GitHub](https://github.com/KhoomeiK/LlamaGym) |
 ### How these relate

 These are notable RL environment frameworks we evaluated but did not implement. They're excluded because they serve a different purpose or operate at a different level of abstraction.
+| Framework | Creator | Why excluded |
+| --- | --- | --- |
+| [**Atropos**](https://github.com/NousResearch/atropos) | Nous Research | Different paradigm, environments own inference and POST scored batches to a central API. Not compatible with TRL's turn-by-turn tool calling. |
+| [**Harbor**](https://github.com/harbor-framework/harbor) | Stanford / Snorkel AI | Offline batch RL only, spins up Docker containers per trial, runs autonomous agents, collects trajectories. No live `environment_factory`. |
+| [**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) | Zhiyuan Zeng | Pure verifier library (445 tasks), `generate() → verify()` with no transport, no tools, no state. Not an environment framework, just problem oracles. |
+| [**Reasoning Gym**](https://github.com/open-thought/reasoning-gym) | Open Thought | Procedural task generators + verifiers, same tier as RLVE. Stateless, no multi-turn, no tools. |
+| [**RAGEN**](https://github.com/ZihanWang314/RAGEN) | Zihan Wang | Full stack (env + StarPO + veRL), tightly coupled to its own training loop. Gym-compatible but not easily separable for TRL integration. |
+| [**rLLM**](https://github.com/agentica-project/rllm) | Agentica | Decorator pattern, wraps existing agent code, intercepts LLM calls. No environment class to subclass. Different paradigm. |
+| [**RL-Factory**](https://github.com/Simple-Efficient/RL-Factory) | Simple-Efficient | MCP config-based, any MCP server becomes an environment. Interesting but very early stage. |
+| [**Open-Instruct**](https://github.com/allenai/open-instruct) | Allen AI | Full training framework with env hooks, environments are reward functions, not multi-turn interactive agents. |
+| [**TextArena**](https://github.com/TextArena/TextArena) | Leon Guertler | Game-specific multi-agent environments, narrow domain, not a general framework. |
+| [**LlamaGym**](https://github.com/KhoomeiK/LlamaGym) | KhoomeiK | Gymnasium wrapper for LLMs, early prototype, not actively maintained. |
 ### How these relate