---
title: Chat Environment Server
emoji: 💬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv-main
  - openenv
---

## Hugging Face Space Deployment

This Space is built from OpenEnv environment `chat_env`.

- Space URL: `https://huggingface.co/spaces/openenv/chat_env`
- OpenEnv pinned ref: `main`
- Hub tag: `openenv`

### Connecting from Code

```python
from envs.chat_env import ChatEnv

env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env")
```

# Chat Environment

A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.

## Overview

ChatEnvironment is a lightweight environment that:
- Manages conversation history in Huggingface chat format
- Handles tokenization internally using any compatible tokenizer
- Stores both messages and tokens for efficient model interaction
- Provides a clean interface for building chat-based RL agents

ChatEnvironment can be used in **two ways**:
1. **Direct usage**: Import and use ChatEnvironment directly in your Python code (best for local development)
2. **HTTP client**: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)

## Quick Start

### Option 1: Direct Usage (Local)

```python
from transformers import AutoTokenizer
from envs.chat_env import ChatAction, ChatObservation
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message

# Initialize with a tokenizer and optional system prompt
tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(
    tokenizer=tokenizer,
    system_prompt="You are a helpful assistant.",
    system_role="system"
)

# Reset the environment
obs = env.reset()
print(f"Messages: {obs.messages}")
print(f"Tokens shape: {obs.tokens.shape}")

# Create an action from a message
user_message: Message = {"role": "user", "content": "Hello!"}
action = env.message_to_action(user_message)

# Step the environment
obs = env.step(action)
print(f"Updated messages: {obs.messages}")
print(f"Updated tokens shape: {obs.tokens.shape}")
```

### Option 2: HTTP Client (Distributed)

```python
from transformers import AutoTokenizer
from envs.chat_env import ChatEnv, ChatAction
import torch

# Create environment from Docker image
client = ChatEnv.from_docker_image("chat-env:latest")

# Or connect to existing server
# client = ChatEnv(base_url="http://localhost:8000")

# Reset
result = client.reset()
print(f"Initial messages: {result.observation.messages}")

# Send an action with tokens
tokenizer = AutoTokenizer.from_pretrained("gpt2")
message = {"role": "user", "content": "Hello!"}
action = client.message_to_action(message, tokenizer)

result = client.step(action)
print(f"Messages: {result.observation.messages}")
print(f"Reward: {result.reward}")

# Cleanup
client.close()
```

### Building the Docker Image

Before using the HTTP client, build the Docker image:

```bash
# From project root
docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile .

# Optionally specify a different tokenizer
docker build -t chat-env:latest \
  --build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
  -f envs/chat_env/server/Dockerfile .
```

## Architecture

### Data Models

#### ChatAction
Actions contain only tokens (PyTorch tensors) that interface directly with models:
```python
@dataclass
class ChatAction(Action):
    tokens: torch.Tensor  # Required, cannot be empty
```

#### ChatObservation
Observations contain both the message history and flattened tokens:
```python
@dataclass
class ChatObservation(Observation):
    messages: list[Message]  # List of {"role": str, "content": str}
    tokens: torch.Tensor     # Flattened tensor of all conversation tokens
    # Inherited: done, reward, metadata
```

#### ChatState
Internal state tracking message and token history:
```python
@dataclass
class ChatState(State):
    history_messages: list[Message]
    history_tokens: list[torch.Tensor]
    # Inherited: episode_id, step_count
```

### Key Methods

#### `reset() -> ChatObservation`
Resets the environment to initial state with optional system prompt.

#### `step(action: ChatAction) -> ChatObservation`
Takes an action (tokens), decodes to text, adds to history, returns updated observation.

#### `message_to_action(message: Message) -> ChatAction`
Convenience method to convert a message dict to a tokenized ChatAction.

## Usage Patterns

### Basic Conversation

```python
from transformers import AutoTokenizer
from envs.chat_env.server import ChatEnvironment
from openenv.core.env_server import Message

tokenizer = AutoTokenizer.from_pretrained("gpt2")
env = ChatEnvironment(tokenizer=tokenizer)

# Reset
obs = env.reset()

# User turn
user_msg: Message = {"role": "user", "content": "What is 2+2?"}
action = env.message_to_action(user_msg)
obs = env.step(action)

# Assistant turn
assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
action = env.message_to_action(assistant_msg)
obs = env.step(action)

# Access conversation history
print(f"Full conversation: {obs.messages}")
print(f"All tokens: {obs.tokens}")
```

### With Transforms

You can add transforms to compute rewards or modify observations:

```python
from openenv.core.env_server import Transform, Observation

class LengthRewardTransform(Transform):
    """Reward based on response length."""

    def __call__(self, observation: Observation) -> Observation:
        if hasattr(observation, 'messages') and observation.messages:
            last_message = observation.messages[-1]
            observation.reward = len(last_message['content']) * 0.1
        return observation

env = ChatEnvironment(
    tokenizer=tokenizer,
    transform=LengthRewardTransform()
)
```

### Direct Token Usage

If you're generating tokens from a model, you can create actions directly:

```python
import torch
from envs.chat_env import ChatAction

# Assume you have tokens from your model
generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])

# Create action directly
action = ChatAction(tokens=generated_tokens)

# Step environment
obs = env.step(action)
```

## Design Philosophy

ChatEnvironment is intentionally minimal and flexible:

1. **No HTTP overhead**: Works directly with Python objects and tensors
2. **Tokenizer ownership**: Environment handles tokenization consistently
3. **Dual representation**: Maintains both human-readable messages and model-ready tokens
4. **Transform support**: Extensible reward computation and observation modification
5. **Type-safe**: Uses typed Messages compatible with Huggingface format

## Integration with Models

ChatEnvironment pairs naturally with language models:

```python
# Pseudo-code for RL training loop
model = YourLanguageModel()
env = ChatEnvironment(tokenizer=model.tokenizer)

for episode in range(num_episodes):
    obs = env.reset()

    while not obs.done:
        # Model generates response tokens
        action_tokens = model.generate(obs.tokens)
        action = ChatAction(tokens=action_tokens)

        # Step environment
        obs = env.step(action)

        # Use obs.reward for RL updates
        model.update(obs.reward)
```

## Project Structure

```
chat_env/
├── __init__.py              # Module exports (ChatEnv, ChatAction, etc.)
├── README.md                # This file
├── client.py                # ChatEnv HTTP client
├── models.py                # ChatAction, ChatObservation, ChatState
└── server/
    ├── __init__.py          # Server module exports
    ├── chat_environment.py  # Core ChatEnvironment implementation
    ├── app.py               # FastAPI server application
    ├── test_chat_env.py     # Unit tests
    └── Dockerfile           # Container image for HTTP server
```

## Requirements

- Python 3.10+
- PyTorch
- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers)

## Notes

- ChatEnvironment does **not** generate responses - it only manages conversation state
- You need to provide tokens from your model or other source
- The environment is thread-safe for single-threaded use only
- For multi-turn conversations, alternate between user and assistant messages