Spaces:

openenv
/

chat_env

Runtime error

App Files Files Community

chat_env / README.md

burtenshaw HF Staff

Upload folder using huggingface_hub

25bcc11 verified 2 days ago

preview code

raw

history blame contribute delete

8.4 kB

	---
	title: Chat Environment Server
	emoji: 💬
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv-main
	- openenv
	---

	## Hugging Face Space Deployment

	This Space is built from OpenEnv environment `chat_env`.

	- Space URL: `https://huggingface.co/spaces/openenv/chat_env`
	- OpenEnv pinned ref: `main`
	- Hub tag: `openenv`

	### Connecting from Code

	```python
	from envs.chat_env import ChatEnv

	env = ChatEnv(base_url="https://huggingface.co/spaces/openenv/chat_env")
	```

	# Chat Environment

	A chat-based environment for LLMs with built-in tokenization and message history management. This environment is designed to work directly with language models and provides a minimal, flexible foundation for conversation-based RL training.

	## Overview

	ChatEnvironment is a lightweight environment that:
	- Manages conversation history in Huggingface chat format
	- Handles tokenization internally using any compatible tokenizer
	- Stores both messages and tokens for efficient model interaction
	- Provides a clean interface for building chat-based RL agents

	ChatEnvironment can be used in two ways:
	1. Direct usage: Import and use ChatEnvironment directly in your Python code (best for local development)
	2. HTTP client: Use ChatEnv client to connect to a ChatEnvironment server (best for distributed/containerized deployments)

	## Quick Start

	### Option 1: Direct Usage (Local)

	```python
	from transformers import AutoTokenizer
	from envs.chat_env import ChatAction, ChatObservation
	from envs.chat_env.server import ChatEnvironment
	from openenv.core.env_server import Message

	# Initialize with a tokenizer and optional system prompt
	tokenizer = AutoTokenizer.from_pretrained("gpt2")
	env = ChatEnvironment(
	tokenizer=tokenizer,
	system_prompt="You are a helpful assistant.",
	system_role="system"
	)

	# Reset the environment
	obs = env.reset()
	print(f"Messages: {obs.messages}")
	print(f"Tokens shape: {obs.tokens.shape}")

	# Create an action from a message
	user_message: Message = {"role": "user", "content": "Hello!"}
	action = env.message_to_action(user_message)

	# Step the environment
	obs = env.step(action)
	print(f"Updated messages: {obs.messages}")
	print(f"Updated tokens shape: {obs.tokens.shape}")
	```

	### Option 2: HTTP Client (Distributed)

	```python
	from transformers import AutoTokenizer
	from envs.chat_env import ChatEnv, ChatAction
	import torch

	# Create environment from Docker image
	client = ChatEnv.from_docker_image("chat-env:latest")

	# Or connect to existing server
	# client = ChatEnv(base_url="http://localhost:8000")

	# Reset
	result = client.reset()
	print(f"Initial messages: {result.observation.messages}")

	# Send an action with tokens
	tokenizer = AutoTokenizer.from_pretrained("gpt2")
	message = {"role": "user", "content": "Hello!"}
	action = client.message_to_action(message, tokenizer)

	result = client.step(action)
	print(f"Messages: {result.observation.messages}")
	print(f"Reward: {result.reward}")

	# Cleanup
	client.close()
	```

	### Building the Docker Image

	Before using the HTTP client, build the Docker image:

	```bash
	# From project root
	docker build -t chat-env:latest -f envs/chat_env/server/Dockerfile .

	# Optionally specify a different tokenizer
	docker build -t chat-env:latest \
	--build-arg TOKENIZER_NAME=meta-llama/Llama-2-7b-chat-hf \
	-f envs/chat_env/server/Dockerfile .
	```

	## Architecture

	### Data Models

	#### ChatAction
	Actions contain only tokens (PyTorch tensors) that interface directly with models:
	```python
	@dataclass
	class ChatAction(Action):
	tokens: torch.Tensor # Required, cannot be empty
	```

	#### ChatObservation
	Observations contain both the message history and flattened tokens:
	```python
	@dataclass
	class ChatObservation(Observation):
	messages: list[Message] # List of {"role": str, "content": str}
	tokens: torch.Tensor # Flattened tensor of all conversation tokens
	# Inherited: done, reward, metadata
	```

	#### ChatState
	Internal state tracking message and token history:
	```python
	@dataclass
	class ChatState(State):
	history_messages: list[Message]
	history_tokens: list[torch.Tensor]
	# Inherited: episode_id, step_count
	```

	### Key Methods

	#### `reset() -> ChatObservation`
	Resets the environment to initial state with optional system prompt.

	#### `step(action: ChatAction) -> ChatObservation`
	Takes an action (tokens), decodes to text, adds to history, returns updated observation.

	#### `message_to_action(message: Message) -> ChatAction`
	Convenience method to convert a message dict to a tokenized ChatAction.

	## Usage Patterns

	### Basic Conversation

	```python
	from transformers import AutoTokenizer
	from envs.chat_env.server import ChatEnvironment
	from openenv.core.env_server import Message

	tokenizer = AutoTokenizer.from_pretrained("gpt2")
	env = ChatEnvironment(tokenizer=tokenizer)

	# Reset
	obs = env.reset()

	# User turn
	user_msg: Message = {"role": "user", "content": "What is 2+2?"}
	action = env.message_to_action(user_msg)
	obs = env.step(action)

	# Assistant turn
	assistant_msg: Message = {"role": "assistant", "content": "2+2 equals 4."}
	action = env.message_to_action(assistant_msg)
	obs = env.step(action)

	# Access conversation history
	print(f"Full conversation: {obs.messages}")
	print(f"All tokens: {obs.tokens}")
	```

	### With Transforms

	You can add transforms to compute rewards or modify observations:

	```python
	from openenv.core.env_server import Transform, Observation

	class LengthRewardTransform(Transform):
	"""Reward based on response length."""

	def __call__(self, observation: Observation) -> Observation:
	if hasattr(observation, 'messages') and observation.messages:
	last_message = observation.messages[-1]
	observation.reward = len(last_message['content']) * 0.1
	return observation

	env = ChatEnvironment(
	tokenizer=tokenizer,
	transform=LengthRewardTransform()
	)
	```

	### Direct Token Usage

	If you're generating tokens from a model, you can create actions directly:

	```python
	import torch
	from envs.chat_env import ChatAction

	# Assume you have tokens from your model
	generated_tokens = torch.tensor([[1, 2, 3, 4, 5]])

	# Create action directly
	action = ChatAction(tokens=generated_tokens)

	# Step environment
	obs = env.step(action)
	```

	## Design Philosophy

	ChatEnvironment is intentionally minimal and flexible:

	1. No HTTP overhead: Works directly with Python objects and tensors
	2. Tokenizer ownership: Environment handles tokenization consistently
	3. Dual representation: Maintains both human-readable messages and model-ready tokens
	4. Transform support: Extensible reward computation and observation modification
	5. Type-safe: Uses typed Messages compatible with Huggingface format

	## Integration with Models

	ChatEnvironment pairs naturally with language models:

	```python
	# Pseudo-code for RL training loop
	model = YourLanguageModel()
	env = ChatEnvironment(tokenizer=model.tokenizer)

	for episode in range(num_episodes):
	obs = env.reset()

	while not obs.done:
	# Model generates response tokens
	action_tokens = model.generate(obs.tokens)
	action = ChatAction(tokens=action_tokens)

	# Step environment
	obs = env.step(action)

	# Use obs.reward for RL updates
	model.update(obs.reward)
	```

	## Project Structure

	```
	chat_env/
	├── __init__.py # Module exports (ChatEnv, ChatAction, etc.)
	├── README.md # This file
	├── client.py # ChatEnv HTTP client
	├── models.py # ChatAction, ChatObservation, ChatState
	└── server/
	├── __init__.py # Server module exports
	├── chat_environment.py # Core ChatEnvironment implementation
	├── app.py # FastAPI server application
	├── test_chat_env.py # Unit tests
	└── Dockerfile # Container image for HTTP server
	```

	## Requirements

	- Python 3.10+
	- PyTorch
	- A tokenizer with `apply_chat_template` method (e.g., Huggingface transformers)

	## Notes

	- ChatEnvironment does not generate responses - it only manages conversation state
	- You need to provide tokens from your model or other source
	- The environment is thread-safe for single-threaded use only
	- For multi-turn conversations, alternate between user and assistant messages