Update app/src/content/chapters/framework-inventory.mdx
#3
by sergiopaniego HF Staff - opened
app/src/content/chapters/framework-inventory.mdx
CHANGED
|
@@ -16,18 +16,18 @@ We surveyed the space and picked these six to build the same environment across
|
|
| 16 |
|
| 17 |
These are notable RL environment frameworks we evaluated but did not implement. They're excluded because they serve a different purpose or operate at a different level of abstraction.
|
| 18 |
|
| 19 |
-
| Framework | Creator | Why excluded |
|
| 20 |
-
| --- | --- | --- |
|
| 21 |
-
| [**Atropos**](https://github.com/NousResearch/atropos) | Nous Research | Different paradigm, environments own inference and POST scored batches to a central API. Not compatible with TRL's turn-by-turn tool calling. |
|
| 22 |
-
| [**Harbor**](https://github.com/harbor-framework/harbor) | Stanford / Snorkel AI | Offline batch RL only, spins up Docker containers per trial, runs autonomous agents, collects trajectories. No live `environment_factory`. |
|
| 23 |
-
| [**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) | Zhiyuan Zeng | Pure verifier library (445 tasks), `generate() → verify()` with no transport, no tools, no state. Not an environment framework, just problem oracles. |
|
| 24 |
-
| [**Reasoning Gym**](https://github.com/open-thought/reasoning-gym) | Open Thought | Procedural task generators + verifiers, same tier as RLVE. Stateless, no multi-turn, no tools. |
|
| 25 |
-
| [**RAGEN**](https://github.com/ZihanWang314/RAGEN) | Zihan Wang | Full stack (env + StarPO + veRL), tightly coupled to its own training loop. Gym-compatible but not easily separable for TRL integration. |
|
| 26 |
-
| [**rLLM**](https://github.com/agentica-project/rllm) | Agentica | Decorator pattern, wraps existing agent code, intercepts LLM calls. No environment class to subclass. Different paradigm. |
|
| 27 |
-
| [**RL-Factory**](https://github.com/Simple-Efficient/RL-Factory) | Simple-Efficient | MCP config-based, any MCP server becomes an environment. Interesting but very early stage. |
|
| 28 |
-
| [**Open-Instruct**](https://github.com/allenai/open-instruct) | Allen AI | Full training framework with env hooks, environments are reward functions, not multi-turn interactive agents. |
|
| 29 |
-
| [**TextArena**](https://github.com/TextArena/TextArena) | Leon Guertler | Game-specific multi-agent environments, narrow domain, not a general framework. |
|
| 30 |
-
| [**LlamaGym**](https://github.com/KhoomeiK/LlamaGym) | KhoomeiK | Gymnasium wrapper for LLMs, early prototype, not actively maintained. |
|
| 31 |
|
| 32 |
### How these relate
|
| 33 |
|
|
|
|
| 16 |
|
| 17 |
These are notable RL environment frameworks we evaluated but did not implement. They're excluded because they serve a different purpose or operate at a different level of abstraction.
|
| 18 |
|
| 19 |
+
| Framework | Creator | Why excluded |
|
| 20 |
+
| --- | --- | --- |
|
| 21 |
+
| [**Atropos**](https://github.com/NousResearch/atropos) | Nous Research | Different paradigm, environments own inference and POST scored batches to a central API. Not compatible with TRL's turn-by-turn tool calling. |
|
| 22 |
+
| [**Harbor**](https://github.com/harbor-framework/harbor) | Stanford / Snorkel AI | Offline batch RL only, spins up Docker containers per trial, runs autonomous agents, collects trajectories. No live `environment_factory`. |
|
| 23 |
+
| [**RLVE**](https://github.com/Zhiyuan-Zeng/RLVE) | Zhiyuan Zeng | Pure verifier library (445 tasks), `generate() → verify()` with no transport, no tools, no state. Not an environment framework, just problem oracles. |
|
| 24 |
+
| [**Reasoning Gym**](https://github.com/open-thought/reasoning-gym) | Open Thought | Procedural task generators + verifiers, same tier as RLVE. Stateless, no multi-turn, no tools. |
|
| 25 |
+
| [**RAGEN**](https://github.com/ZihanWang314/RAGEN) | Zihan Wang | Full stack (env + StarPO + veRL), tightly coupled to its own training loop. Gym-compatible but not easily separable for TRL integration. |
|
| 26 |
+
| [**rLLM**](https://github.com/agentica-project/rllm) | Agentica | Decorator pattern, wraps existing agent code, intercepts LLM calls. No environment class to subclass. Different paradigm. |
|
| 27 |
+
| [**RL-Factory**](https://github.com/Simple-Efficient/RL-Factory) | Simple-Efficient | MCP config-based, any MCP server becomes an environment. Interesting but very early stage. |
|
| 28 |
+
| [**Open-Instruct**](https://github.com/allenai/open-instruct) | Allen AI | Full training framework with env hooks, environments are reward functions, not multi-turn interactive agents. |
|
| 29 |
+
| [**TextArena**](https://github.com/TextArena/TextArena) | Leon Guertler | Game-specific multi-agent environments, narrow domain, not a general framework. |
|
| 30 |
+
| [**LlamaGym**](https://github.com/KhoomeiK/LlamaGym) | KhoomeiK | Gymnasium wrapper for LLMs, early prototype, not actively maintained. |
|
| 31 |
|
| 32 |
### How these relate
|
| 33 |
|