File size: 8,401 Bytes
358936f
25bcc11
 
8ab8289
 
358936f
 
8ab8289
 
1965b27
25bcc11
1965b27
358936f
 
25bcc11
8ab8289
25bcc11
8ab8289
25bcc11
 
 
8ab8289
25bcc11
8ab8289
25bcc11
 
8ab8289
25bcc11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ab8289
25bcc11
 
8ab8289
25bcc11
 
8ab8289
25bcc11
 
8ab8289
25bcc11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ab8289
 
25bcc11
8ab8289
25bcc11
 
 
8ab8289
25bcc11
8ab8289
25bcc11
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
---
title: Chat Environment Server
emoji: πŸ’¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv-main
  - openenv
---

## Hugging Face Space Deployment

This Space is built from OpenEnv environment `chat_env`.

- Space URL: `https://huggingface.co/spaces/openenv/chat_env`
- OpenEnv pinned ref: `main`
- Hub tag: `openenv`

### Connecting from Code

```python
from envs.chat_env import ChatEnv

env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env")
```

# Chat Environment

A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.

## Overview

ChatEnvironment is a lightweight environment that:
- Manages conversation history in Huggingface chat format
- Handles tokenization internally using any compatible tokenizer
- Stores both messages and tokens for efficient model interaction
- Provides a clean interface for building chat-based RL agents

ChatEnvironment can be used in **two ways**:
1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development)
2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)

## Quick Start

### Option 1: Direct Usage (Local)

```python
from transformers import AutoTokenizer
from envs.chat_env import ChatAction, ChatObservation
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message

# Initialize with a tokenizer and optional system prompt
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(
    tokenizer=tokenizer,
    system_prompt="You are a helpful assistant.",
    system_role="system"
)

# Reset the environment
obs = env.reset()
print(f"Messages: {obs.messages}")
print(f"Tokens shape: {obs.tokens.shape}")

# Create an action from a message
user_message: Message = {"role": "user", "content": "Hello!"}
action = env.message_to_action(user_message)

# Step the environment
obs = env.step(action)
print(f"Updated messages: {obs.messages}")
print(f"Updated tokens shape: {obs.tokens.shape}")
```

### Option 2: HTTP Client (Distributed)

```python
from transformers import AutoTokenizer
from envs.chat_env import ChatEnv, ChatAction
import torch

# Create environment from Docker image
client = ChatEnv.from_docker_image("chat-env:latest")

# Or connect to existing server
# client = ChatEnv(base_url="http://localhost:8000")

# Reset
result = client.reset()
print(f"Initial messages: {result.observation.messages}")

# Send an action with tokens
tokenizer = AutoTokenizer.from_pretrained("gpt2")
message = {"role": "user", "content": "Hello!"}
action = client.message_to_action(message, tokenizer)

result = client.step(action)
print(f"Messages: {result.observation.messages}")
print(f"Reward: {result.reward}")

# Cleanup
client.close()
```

### Building the Docker Image

Before using the HTTP client, build the Docker image:

```bash
# From project root
docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile .

# Optionally specify a different tokenizer
docker build -t chat-env:latest \
  --build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
  -f envs/chat_env/server/Dockerfile .
```

## Architecture

### Data Models

#### ChatAction
Actions contain only tokens (PyTorch tensors) that interface directly with models:
```python
@dataclass
class ChatAction(Action):
    tokens: torch.Tensor  # Required, cannot be empty
```

#### ChatObservation
Observations contain both the message history and flattened tokens:
```python
@dataclass
class ChatObservation(Observation):
    messages: list[Message]  # List of {"role": str, "content": str}
    tokens: torch.Tensor     # Flattened tensor of all conversation tokens
    # Inherited: done, reward, metadata
```

#### ChatState
Internal state tracking message and token history:
```python
@dataclass
class ChatState(State):
    history_messages: list[Message]
    history_tokens: list[torch.Tensor]
    # Inherited: episode_id, step_count
```

### Key Methods

#### `reset() -> ChatObservation`
Resets the environment to initial state with optional system prompt.

#### `step(action: ChatAction) -> ChatObservation`
Takes an action (tokens), decodes to text, adds to history, returns updated observation.

#### `message_to_action(message: Message) -> ChatAction`
Convenience method to convert a message dict to a tokenized ChatAction.

## Usage Patterns

### Basic Conversation

```python
from transformers import AutoTokenizer
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message

tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(tokenizer=tokenizer)

# Reset
obs = env.reset()

# User turn
user_msg: Message = {"role": "user", "content": "What is 2+2?"}
action = env.message_to_action(user_msg)
obs = env.step(action)

# Assistant turn
assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
action = env.message_to_action(assistant_msg)
obs = env.step(action)

# Access conversation history
print(f"Full conversation: {obs.messages}")
print(f"All tokens: {obs.tokens}")
```

### With Transforms

You can add transforms to compute rewards or modify observations:

```python
from openenv.core.env_server import Transform, Observation

class LengthRewardTransform(Transform):
    """Reward based on response length."""

    def __call__(self, observation: Observation) -> Observation:
        if hasattr(observation, 'messages') and observation.messages:
            last_message = observation.messages[-1]
            observation.reward = len(last_message['content']) * 0.1
        return observation

env = ChatEnvironment(
    tokenizer=tokenizer,
    transform=LengthRewardTransform()
)
```

### Direct Token Usage

If you're generating tokens from a model, you can create actions directly:

```python
import torch
from envs.chat_env import ChatAction

# Assume you have tokens from your model
generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])

# Create action directly
action = ChatAction(tokens=generated_tokens)

# Step environment
obs = env.step(action)
```

## Design Philosophy

ChatEnvironment is intentionally minimal and flexible:

1. **No HTTP overhead**: Works directly with Python objects and tensors
2. **Tokenizer ownership**: Environment handles tokenization consistently
3. **Dual representation**: Maintains both human-readable messages and model-ready tokens
4. **Transform support**: Extensible reward computation and observation modification
5. **Type-safe**: Uses typed Messages compatible with Huggingface format

## Integration with Models

ChatEnvironment pairs naturally with language models:

```python
# Pseudo-code for RL training loop
model = YourLanguageModel()
env = ChatEnvironment(tokenizer=model.tokenizer)

for episode in range(num_episodes):
    obs = env.reset()

    while not obs.done:
        # Model generates response tokens
        action_tokens = model.generate(obs.tokens)
        action = ChatAction(tokens=action_tokens)

        # Step environment
        obs = env.step(action)

        # Use obs.reward for RL updates
        model.update(obs.reward)
```

## Project Structure

```
chat_env/
β”œβ”€β”€ __init__.py              # Module exports (ChatEnv, ChatAction, etc.)
β”œβ”€β”€ README.md                # This file
β”œβ”€β”€ client.py                # ChatEnv HTTP client
β”œβ”€β”€ models.py                # ChatAction, ChatObservation, ChatState
└── server/
    β”œβ”€β”€ __init__.py          # Server module exports
    β”œβ”€β”€ chat_environment.py  # Core ChatEnvironment implementation
    β”œβ”€β”€ app.py               # FastAPI server application
    β”œβ”€β”€ test_chat_env.py     # Unit tests
    └── Dockerfile           # Container image for HTTP server
```

## Requirements

- Python 3.10+
- PyTorch
- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers)

## Notes

- ChatEnvironment does **not** generate responses - it only manages conversation state
- You need to provide tokens from your model or other source
- The environment is thread-safe for single-threaded use only
- For multi-turn conversations, alternate between user and assistant messages