Spaces:
Runtime error
Runtime error
| title: Chat Environment Server | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv-main | |
| - openenv | |
| ## Hugging Face Space Deployment | |
| This Space is built from OpenEnv environment `chat_env`. | |
| - Space URL: `https://huggingface.co/spaces/openenv/chat_env` | |
| - OpenEnv pinned ref: `main` | |
| - Hub tag: `openenv` | |
| ### Connecting from Code | |
| ```python | |
| from envs.chat_env import ChatEnv | |
| env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env") | |
| ``` | |
| # Chat Environment | |
| A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training. | |
| ## Overview | |
| ChatEnvironment is a lightweight environment that: | |
| - Manages conversation history in Huggingface chat format | |
| - Handles tokenization internally using any compatible tokenizer | |
| - Stores both messages and tokens for efficient model interaction | |
| - Provides a clean interface for building chat-based RL agents | |
| ChatEnvironment can be used in **two ways**: | |
| 1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development) | |
| 2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments) | |
| ## Quick Start | |
| ### Option 1: Direct Usage (Local) | |
| ```python | |
| from transformers import AutoTokenizer | |
| from envs.chat_env import ChatAction, ChatObservation | |
| from envs.chat_env.server import ChatEnvironment | |
| from openenv.core.env_server import Message | |
| # Initialize with a tokenizer and optional system prompt | |
| tokenizer = AutoTokenizer.from_pretrained("gpt2") | |
| env = ChatEnvironment( | |
| tokenizer=tokenizer, | |
| system_prompt="You are a helpful assistant.", | |
| system_role="system" | |
| ) | |
| # Reset the environment | |
| obs = env.reset() | |
| print(f"Messages: {obs.messages}") | |
| print(f"Tokens shape: {obs.tokens.shape}") | |
| # Create an action from a message | |
| user_message: Message = {"role": "user", "content": "Hello!"} | |
| action = env.message_to_action(user_message) | |
| # Step the environment | |
| obs = env.step(action) | |
| print(f"Updated messages: {obs.messages}") | |
| print(f"Updated tokens shape: {obs.tokens.shape}") | |
| ``` | |
| ### Option 2: HTTP Client (Distributed) | |
| ```python | |
| from transformers import AutoTokenizer | |
| from envs.chat_env import ChatEnv, ChatAction | |
| import torch | |
| # Create environment from Docker image | |
| client = ChatEnv.from_docker_image("chat-env:latest") | |
| # Or connect to existing server | |
| # client = ChatEnv(base_url="http://localhost:8000") | |
| # Reset | |
| result = client.reset() | |
| print(f"Initial messages: {result.observation.messages}") | |
| # Send an action with tokens | |
| tokenizer = AutoTokenizer.from_pretrained("gpt2") | |
| message = {"role": "user", "content": "Hello!"} | |
| action = client.message_to_action(message, tokenizer) | |
| result = client.step(action) | |
| print(f"Messages: {result.observation.messages}") | |
| print(f"Reward: {result.reward}") | |
| # Cleanup | |
| client.close() | |
| ``` | |
| ### Building the Docker Image | |
| Before using the HTTP client, build the Docker image: | |
| ```bash | |
| # From project root | |
| docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile . | |
| # Optionally specify a different tokenizer | |
| docker build -t chat-env:latest \ | |
| --build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \ | |
| -f envs/chat_env/server/Dockerfile . | |
| ``` | |
| ## Architecture | |
| ### Data Models | |
| #### ChatAction | |
| Actions contain only tokens (PyTorch tensors) that interface directly with models: | |
| ```python | |
| @dataclass | |
| class ChatAction(Action): | |
| tokens: torch.Tensor # Required, cannot be empty | |
| ``` | |
| #### ChatObservation | |
| Observations contain both the message history and flattened tokens: | |
| ```python | |
| @dataclass | |
| class ChatObservation(Observation): | |
| messages: list[Message] # List of {"role": str, "content": str} | |
| tokens: torch.Tensor # Flattened tensor of all conversation tokens | |
| # Inherited: done, reward, metadata | |
| ``` | |
| #### ChatState | |
| Internal state tracking message and token history: | |
| ```python | |
| @dataclass | |
| class ChatState(State): | |
| history_messages: list[Message] | |
| history_tokens: list[torch.Tensor] | |
| # Inherited: episode_id, step_count | |
| ``` | |
| ### Key Methods | |
| #### `reset() -> ChatObservation` | |
| Resets the environment to initial state with optional system prompt. | |
| #### `step(action: ChatAction) -> ChatObservation` | |
| Takes an action (tokens), decodes to text, adds to history, returns updated observation. | |
| #### `message_to_action(message: Message) -> ChatAction` | |
| Convenience method to convert a message dict to a tokenized ChatAction. | |
| ## Usage Patterns | |
| ### Basic Conversation | |
| ```python | |
| from transformers import AutoTokenizer | |
| from envs.chat_env.server import ChatEnvironment | |
| from openenv.core.env_server import Message | |
| tokenizer = AutoTokenizer.from_pretrained("gpt2") | |
| env = ChatEnvironment(tokenizer=tokenizer) | |
| # Reset | |
| obs = env.reset() | |
| # User turn | |
| user_msg: Message = {"role": "user", "content": "What is 2+2?"} | |
| action = env.message_to_action(user_msg) | |
| obs = env.step(action) | |
| # Assistant turn | |
| assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."} | |
| action = env.message_to_action(assistant_msg) | |
| obs = env.step(action) | |
| # Access conversation history | |
| print(f"Full conversation: {obs.messages}") | |
| print(f"All tokens: {obs.tokens}") | |
| ``` | |
| ### With Transforms | |
| You can add transforms to compute rewards or modify observations: | |
| ```python | |
| from openenv.core.env_server import Transform, Observation | |
| class LengthRewardTransform(Transform): | |
| """Reward based on response length.""" | |
| def __call__(self, observation: Observation) -> Observation: | |
| if hasattr(observation, 'messages') and observation.messages: | |
| last_message = observation.messages[-1] | |
| observation.reward = len(last_message['content']) * 0.1 | |
| return observation | |
| env = ChatEnvironment( | |
| tokenizer=tokenizer, | |
| transform=LengthRewardTransform() | |
| ) | |
| ``` | |
| ### Direct Token Usage | |
| If you're generating tokens from a model, you can create actions directly: | |
| ```python | |
| import torch | |
| from envs.chat_env import ChatAction | |
| # Assume you have tokens from your model | |
| generated_tokens = torch.tensor([[1, 2, 3, 4, 5]]) | |
| # Create action directly | |
| action = ChatAction(tokens=generated_tokens) | |
| # Step environment | |
| obs = env.step(action) | |
| ``` | |
| ## Design Philosophy | |
| ChatEnvironment is intentionally minimal and flexible: | |
| 1. **No HTTP overhead**: Works directly with Python objects and tensors | |
| 2. **Tokenizer ownership**: Environment handles tokenization consistently | |
| 3. **Dual representation**: Maintains both human-readable messages and model-ready tokens | |
| 4. **Transform support**: Extensible reward computation and observation modification | |
| 5. **Type-safe**: Uses typed Messages compatible with Huggingface format | |
| ## Integration with Models | |
| ChatEnvironment pairs naturally with language models: | |
| ```python | |
| # Pseudo-code for RL training loop | |
| model = YourLanguageModel() | |
| env = ChatEnvironment(tokenizer=model.tokenizer) | |
| for episode in range(num_episodes): | |
| obs = env.reset() | |
| while not obs.done: | |
| # Model generates response tokens | |
| action_tokens = model.generate(obs.tokens) | |
| action = ChatAction(tokens=action_tokens) | |
| # Step environment | |
| obs = env.step(action) | |
| # Use obs.reward for RL updates | |
| model.update(obs.reward) | |
| ``` | |
| ## Project Structure | |
| ``` | |
| chat_env/ | |
| βββ __init__.py # Module exports (ChatEnv, ChatAction, etc.) | |
| βββ README.md # This file | |
| βββ client.py # ChatEnv HTTP client | |
| βββ models.py # ChatAction, ChatObservation, ChatState | |
| βββ server/ | |
| βββ __init__.py # Server module exports | |
| βββ chat_environment.py # Core ChatEnvironment implementation | |
| βββ app.py # FastAPI server application | |
| βββ test_chat_env.py # Unit tests | |
| βββ Dockerfile # Container image for HTTP server | |
| ``` | |
| ## Requirements | |
| - Python 3.10+ | |
| - PyTorch | |
| - A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers) | |
| ## Notes | |
| - ChatEnvironment does **not** generate responses - it only manages conversation state | |
| - You need to provide tokens from your model or other source | |
| - The environment is thread-safe for single-threaded use only | |
| - For multi-turn conversations, alternate between user and assistant messages | |