Spaces:
Runtime error
Runtime error
File size: 8,401 Bytes
358936f 25bcc11 8ab8289 358936f 8ab8289 1965b27 25bcc11 1965b27 358936f 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 8ab8289 25bcc11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | ---
title: Chat Environment Server
emoji: π¬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv-main
- openenv
---
## Hugging Face Space Deployment
This Space is built from OpenEnv environment `chat_env`.
- Space URL: `https://huggingface.co/spaces/openenv/chat_env`
- OpenEnv pinned ref: `main`
- Hub tag: `openenv`
### Connecting from Code
```python
from envs.chat_env import ChatEnv
env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env")
```
# Chat Environment
A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.
## Overview
ChatEnvironment is a lightweight environment that:
- Manages conversation history in Huggingface chat format
- Handles tokenization internally using any compatible tokenizer
- Stores both messages and tokens for efficient model interaction
- Provides a clean interface for building chat-based RL agents
ChatEnvironment can be used in **two ways**:
1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development)
2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)
## Quick Start
### Option 1: Direct Usage (Local)
```python
from transformers import AutoTokenizer
from envs.chat_env import ChatAction, ChatObservation
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message
# Initialize with a tokenizer and optional system prompt
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(
tokenizer=tokenizer,
system_prompt="You are a helpful assistant.",
system_role="system"
)
# Reset the environment
obs = env.reset()
print(f"Messages: {obs.messages}")
print(f"Tokens shape: {obs.tokens.shape}")
# Create an action from a message
user_message: Message = {"role": "user", "content": "Hello!"}
action = env.message_to_action(user_message)
# Step the environment
obs = env.step(action)
print(f"Updated messages: {obs.messages}")
print(f"Updated tokens shape: {obs.tokens.shape}")
```
### Option 2: HTTP Client (Distributed)
```python
from transformers import AutoTokenizer
from envs.chat_env import ChatEnv, ChatAction
import torch
# Create environment from Docker image
client = ChatEnv.from_docker_image("chat-env:latest")
# Or connect to existing server
# client = ChatEnv(base_url="http://localhost:8000")
# Reset
result = client.reset()
print(f"Initial messages: {result.observation.messages}")
# Send an action with tokens
tokenizer = AutoTokenizer.from_pretrained("gpt2")
message = {"role": "user", "content": "Hello!"}
action = client.message_to_action(message, tokenizer)
result = client.step(action)
print(f"Messages: {result.observation.messages}")
print(f"Reward: {result.reward}")
# Cleanup
client.close()
```
### Building the Docker Image
Before using the HTTP client, build the Docker image:
```bash
# From project root
docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile .
# Optionally specify a different tokenizer
docker build -t chat-env:latest \
--build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
-f envs/chat_env/server/Dockerfile .
```
## Architecture
### Data Models
#### ChatAction
Actions contain only tokens (PyTorch tensors) that interface directly with models:
```python
@dataclass
class ChatAction(Action):
tokens: torch.Tensor # Required, cannot be empty
```
#### ChatObservation
Observations contain both the message history and flattened tokens:
```python
@dataclass
class ChatObservation(Observation):
messages: list[Message] # List of {"role": str, "content": str}
tokens: torch.Tensor # Flattened tensor of all conversation tokens
# Inherited: done, reward, metadata
```
#### ChatState
Internal state tracking message and token history:
```python
@dataclass
class ChatState(State):
history_messages: list[Message]
history_tokens: list[torch.Tensor]
# Inherited: episode_id, step_count
```
### Key Methods
#### `reset() -> ChatObservation`
Resets the environment to initial state with optional system prompt.
#### `step(action: ChatAction) -> ChatObservation`
Takes an action (tokens), decodes to text, adds to history, returns updated observation.
#### `message_to_action(message: Message) -> ChatAction`
Convenience method to convert a message dict to a tokenized ChatAction.
## Usage Patterns
### Basic Conversation
```python
from transformers import AutoTokenizer
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(tokenizer=tokenizer)
# Reset
obs = env.reset()
# User turn
user_msg: Message = {"role": "user", "content": "What is 2+2?"}
action = env.message_to_action(user_msg)
obs = env.step(action)
# Assistant turn
assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
action = env.message_to_action(assistant_msg)
obs = env.step(action)
# Access conversation history
print(f"Full conversation: {obs.messages}")
print(f"All tokens: {obs.tokens}")
```
### With Transforms
You can add transforms to compute rewards or modify observations:
```python
from openenv.core.env_server import Transform, Observation
class LengthRewardTransform(Transform):
"""Reward based on response length."""
def __call__(self, observation: Observation) -> Observation:
if hasattr(observation, 'messages') and observation.messages:
last_message = observation.messages[-1]
observation.reward = len(last_message['content']) * 0.1
return observation
env = ChatEnvironment(
tokenizer=tokenizer,
transform=LengthRewardTransform()
)
```
### Direct Token Usage
If you're generating tokens from a model, you can create actions directly:
```python
import torch
from envs.chat_env import ChatAction
# Assume you have tokens from your model
generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])
# Create action directly
action = ChatAction(tokens=generated_tokens)
# Step environment
obs = env.step(action)
```
## Design Philosophy
ChatEnvironment is intentionally minimal and flexible:
1. **No HTTP overhead**: Works directly with Python objects and tensors
2. **Tokenizer ownership**: Environment handles tokenization consistently
3. **Dual representation**: Maintains both human-readable messages and model-ready tokens
4. **Transform support**: Extensible reward computation and observation modification
5. **Type-safe**: Uses typed Messages compatible with Huggingface format
## Integration with Models
ChatEnvironment pairs naturally with language models:
```python
# Pseudo-code for RL training loop
model = YourLanguageModel()
env = ChatEnvironment(tokenizer=model.tokenizer)
for episode in range(num_episodes):
obs = env.reset()
while not obs.done:
# Model generates response tokens
action_tokens = model.generate(obs.tokens)
action = ChatAction(tokens=action_tokens)
# Step environment
obs = env.step(action)
# Use obs.reward for RL updates
model.update(obs.reward)
```
## Project Structure
```
chat_env/
βββ __init__.py # Module exports (ChatEnv, ChatAction, etc.)
βββ README.md # This file
βββ client.py # ChatEnv HTTP client
βββ models.py # ChatAction, ChatObservation, ChatState
βββ server/
βββ __init__.py # Server module exports
βββ chat_environment.py # Core ChatEnvironment implementation
βββ app.py # FastAPI server application
βββ test_chat_env.py # Unit tests
βββ Dockerfile # Container image for HTTP server
```
## Requirements
- Python 3.10+
- PyTorch
- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers)
## Notes
- ChatEnvironment does **not** generate responses - it only manages conversation state
- You need to provide tokens from your model or other source
- The environment is thread-safe for single-threaded use only
- For multi-turn conversations, alternate between user and assistant messages
|