chat_env / README.md
burtenshaw's picture
burtenshaw HF Staff
Upload folder using huggingface_hub
25bcc11 verified
---
title: Chat Environment Server
emoji: πŸ’¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv-main
- openenv
---
## Hugging Face Space Deployment
This Space is built from OpenEnv environment `chat_env`.
- Space URL: `https://huggingface.co/spaces/openenv/chat_env`
- OpenEnv pinned ref: `main`
- Hub tag: `openenv`
### Connecting from Code
```python
from envs.chat_env import ChatEnv
env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env")
```
# Chat Environment
A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.
## Overview
ChatEnvironment is a lightweight environment that:
- Manages conversation history in Huggingface chat format
- Handles tokenization internally using any compatible tokenizer
- Stores both messages and tokens for efficient model interaction
- Provides a clean interface for building chat-based RL agents
ChatEnvironment can be used in **two ways**:
1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development)
2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)
## Quick Start
### Option 1: Direct Usage (Local)
```python
from transformers import AutoTokenizer
from envs.chat_env import ChatAction, ChatObservation
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message
# Initialize with a tokenizer and optional system prompt
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(
tokenizer=tokenizer,
system_prompt="You are a helpful assistant.",
system_role="system"
)
# Reset the environment
obs = env.reset()
print(f"Messages: {obs.messages}")
print(f"Tokens shape: {obs.tokens.shape}")
# Create an action from a message
user_message: Message = {"role": "user", "content": "Hello!"}
action = env.message_to_action(user_message)
# Step the environment
obs = env.step(action)
print(f"Updated messages: {obs.messages}")
print(f"Updated tokens shape: {obs.tokens.shape}")
```
### Option 2: HTTP Client (Distributed)
```python
from transformers import AutoTokenizer
from envs.chat_env import ChatEnv, ChatAction
import torch
# Create environment from Docker image
client = ChatEnv.from_docker_image("chat-env:latest")
# Or connect to existing server
# client = ChatEnv(base_url="http://localhost:8000")
# Reset
result = client.reset()
print(f"Initial messages: {result.observation.messages}")
# Send an action with tokens
tokenizer = AutoTokenizer.from_pretrained("gpt2")
message = {"role": "user", "content": "Hello!"}
action = client.message_to_action(message, tokenizer)
result = client.step(action)
print(f"Messages: {result.observation.messages}")
print(f"Reward: {result.reward}")
# Cleanup
client.close()
```
### Building the Docker Image
Before using the HTTP client, build the Docker image:
```bash
# From project root
docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile .
# Optionally specify a different tokenizer
docker build -t chat-env:latest \
--build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
-f envs/chat_env/server/Dockerfile .
```
## Architecture
### Data Models
#### ChatAction
Actions contain only tokens (PyTorch tensors) that interface directly with models:
```python
@dataclass
class ChatAction(Action):
tokens: torch.Tensor # Required, cannot be empty
```
#### ChatObservation
Observations contain both the message history and flattened tokens:
```python
@dataclass
class ChatObservation(Observation):
messages: list[Message] # List of {"role": str, "content": str}
tokens: torch.Tensor # Flattened tensor of all conversation tokens
# Inherited: done, reward, metadata
```
#### ChatState
Internal state tracking message and token history:
```python
@dataclass
class ChatState(State):
history_messages: list[Message]
history_tokens: list[torch.Tensor]
# Inherited: episode_id, step_count
```
### Key Methods
#### `reset() -> ChatObservation`
Resets the environment to initial state with optional system prompt.
#### `step(action: ChatAction) -> ChatObservation`
Takes an action (tokens), decodes to text, adds to history, returns updated observation.
#### `message_to_action(message: Message) -> ChatAction`
Convenience method to convert a message dict to a tokenized ChatAction.
## Usage Patterns
### Basic Conversation
```python
from transformers import AutoTokenizer
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(tokenizer=tokenizer)
# Reset
obs = env.reset()
# User turn
user_msg: Message = {"role": "user", "content": "What is 2+2?"}
action = env.message_to_action(user_msg)
obs = env.step(action)
# Assistant turn
assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
action = env.message_to_action(assistant_msg)
obs = env.step(action)
# Access conversation history
print(f"Full conversation: {obs.messages}")
print(f"All tokens: {obs.tokens}")
```
### With Transforms
You can add transforms to compute rewards or modify observations:
```python
from openenv.core.env_server import Transform, Observation
class LengthRewardTransform(Transform):
"""Reward based on response length."""
def __call__(self, observation: Observation) -> Observation:
if hasattr(observation, 'messages') and observation.messages:
last_message = observation.messages[-1]
observation.reward = len(last_message['content']) * 0.1
return observation
env = ChatEnvironment(
tokenizer=tokenizer,
transform=LengthRewardTransform()
)
```
### Direct Token Usage
If you're generating tokens from a model, you can create actions directly:
```python
import torch
from envs.chat_env import ChatAction
# Assume you have tokens from your model
generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])
# Create action directly
action = ChatAction(tokens=generated_tokens)
# Step environment
obs = env.step(action)
```
## Design Philosophy
ChatEnvironment is intentionally minimal and flexible:
1. **No HTTP overhead**: Works directly with Python objects and tensors
2. **Tokenizer ownership**: Environment handles tokenization consistently
3. **Dual representation**: Maintains both human-readable messages and model-ready tokens
4. **Transform support**: Extensible reward computation and observation modification
5. **Type-safe**: Uses typed Messages compatible with Huggingface format
## Integration with Models
ChatEnvironment pairs naturally with language models:
```python
# Pseudo-code for RL training loop
model = YourLanguageModel()
env = ChatEnvironment(tokenizer=model.tokenizer)
for episode in range(num_episodes):
obs = env.reset()
while not obs.done:
# Model generates response tokens
action_tokens = model.generate(obs.tokens)
action = ChatAction(tokens=action_tokens)
# Step environment
obs = env.step(action)
# Use obs.reward for RL updates
model.update(obs.reward)
```
## Project Structure
```
chat_env/
β”œβ”€β”€ __init__.py # Module exports (ChatEnv, ChatAction, etc.)
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ client.py # ChatEnv HTTP client
β”œβ”€β”€ models.py # ChatAction, ChatObservation, ChatState
└── server/
β”œβ”€β”€ __init__.py # Server module exports
β”œβ”€β”€ chat_environment.py # Core ChatEnvironment implementation
β”œβ”€β”€ app.py # FastAPI server application
β”œβ”€β”€ test_chat_env.py # Unit tests
└── Dockerfile # Container image for HTTP server
```
## Requirements
- Python 3.10+
- PyTorch
- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers)
## Notes
- ChatEnvironment does **not** generate responses - it only manages conversation state
- You need to provide tokens from your model or other source
- The environment is thread-safe for single-threaded use only
- For multi-turn conversations, alternate between user and assistant messages