Spaces:

yashu2000
/

TemporalBenchEnv

Sleeping

App Files Files Community

TemporalBenchEnv / README.md

yashu2000

Upload folder using huggingface_hub

d954568 verified about 2 months ago

preview code

raw

history blame contribute delete

6.94 kB

	---
	title: TemporalBenchEnv MCQ Server
	emoji: 🥁
	colorFrom: yellow
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# TemporalBenchEnv

	OpenEnv environment for multi-step multiple-choice time-series reasoning. Each episode samples nine questions from pre-built JSON banks (per-dataset files or merged JSONL in `TSQuestion` shape). Rewards combine per-step correctness and an episode bonus (see `env/reward.py`).

	## Question bank layout

	Point the server at a directory containing `PSML_questions.json`, `freshretailnet_questions.json`, `MIMIC_questions.json`, and `causal_chambers_questions.json` (each file is a JSON array of `TSQuestion` records), or set `TEMPORALBENCH_QUESTION_BANK_DIR` to that path. If unset, the server uses `tests/fixtures/banks` when present (for local smoke runs).

	Each record must include at least: `question_id`, `dataset`, `task_type` (`T1U` \| `T3` \| `T2_MCQ`), `prompt`, `options` (length ≥ 2), `answer`, plus optional `family`, `capability_tags`, `difficulty`, `metadata`.

	## Quick Start

	Use the typed client (`TemporalBenchEnvClient`; alias `TemporalbenchenvEnv`):

	```python
	from client import TemporalBenchAction, TemporalBenchEnvClient

	try:
	env = TemporalBenchEnvClient.from_docker_image("TemporalBenchEnv-env:latest")
	out = env.reset()
	while not out.done:
	q = out.observation
	# Agent picks q.options[i] or equivalent label string
	out = env.step(TemporalBenchAction(answer=q.options[0]))
	finally:
	env.close()
	```

	`TemporalBenchEnvClient.from_docker_image()` handles:
	- Starting the Docker container
	- Waiting for the server to be ready
	- Connecting to the environment
	- Container cleanup when you call `close()`

	## Building the Docker Image

	Before using the environment, you need to build the Docker image:

	```bash
	# From project root
	docker build -t TemporalBenchEnv-env:latest -f server/Dockerfile .
	```

	## Deploying to Hugging Face Spaces

	You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:

	```bash
	# From the environment directory (where openenv.yaml is located)
	openenv push

	# Or specify options
	openenv push --namespace my-org --private
	```

	The `openenv push` command will:
	1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
	2. Prepare a custom build for Hugging Face Docker space (enables web interface)
	3. Upload to Hugging Face (ensuring you're logged in)

	### Prerequisites

	- Authenticate with Hugging Face: The command will prompt for login if not already authenticated

	### Options

	- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
	- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
	- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
	- `--private`: Deploy the space as private (default: public)

	### Examples

	```bash
	# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
	openenv push

	# Push to a specific repository
	openenv push --repo-id my-org/my-env

	# Push with a custom base image
	openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest

	# Push as a private space
	openenv push --private

	# Combine options
	openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
	```

	After deployment, your space will be available at:
	`https://huggingface.co/spaces/<repo-id>`

	The deployed space includes:
	- Web Interface at `/web` - Interactive UI for exploring the environment
	- API Documentation at `/docs` - Full OpenAPI/Swagger interface
	- Health Check at `/health` - Container health monitoring
	- WebSocket at `/ws` - Persistent session endpoint for low-latency interactions

	## Environment Details

	### Action (`TemporalBenchAction`)
	- `answer` (str) — MCQ label (must match ground truth after optional normalization)
	- `confidence`, `reasoning` — optional

	### Observation (`TemporalBenchObservation`)
	- `question`, `options`, `task_type`, `dataset`, `history`, `accuracy_so_far`
	- `step_idx`, `steps_remaining`, `max_steps`, `done`, `reward`, `metadata`

	### Reward
	- Per step: `alpha * correctness` (correctness 0 or 1).
	- On the final step, adds episode bonus: `lambda_ep * (total_correct / N) * coverage_multiplier` (1.0 if every dataset in the episode has at least one correct answer, else 0.8).

	## Advanced Usage

	### Connecting to an Existing Server

	If you already have a TemporalBenchEnv server running, connect with:

	```python
	from client import TemporalBenchAction, TemporalBenchEnvClient

	with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
	r = env.reset()
	r = env.step(TemporalBenchAction(answer=r.observation.options[0]))
	```

	Note: `close()` does not stop a remote server you attached to with `base_url=...`.

	### Using the Context Manager

	The client supports context manager usage for automatic connection management:

	```python
	from client import TemporalBenchAction, TemporalBenchEnvClient

	with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
	result = env.reset()
	while not result.done:
	ans = result.observation.options[0]
	result = env.step(TemporalBenchAction(answer=ans))
	```

	The client uses WebSocket connections for:
	- Lower latency: No HTTP connection overhead per request
	- Persistent session: Server maintains your environment state
	- Efficient for episodes: Better for many sequential steps

	### Concurrent WebSocket Sessions

	The server uses factory mode (`create_app(_env_factory, ...)`) so each WebSocket session gets a fresh `TemporalBenchEnvironment`. Tune `max_concurrent_envs` in `server/app.py` as needed.

	## Development & Testing

	### Direct environment testing

	```bash
	uv sync --extra dev
	uv run pytest tests/
	```

	### Running Locally

	Run the server locally for development:

	```bash
	uvicorn server.app:app --reload
	```

	## Project Structure

	```
	TemporalBenchEnv/
	├── .dockerignore # Docker build exclusions
	├── __init__.py # Module exports
	├── README.md # This file
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml # Project metadata and dependencies
	├── uv.lock # Locked dependencies (generated)
	├── client.py # TemporalBenchEnvClient (alias TemporalbenchenvEnv)
	├── models.py # Action / observation / state re-exports
	├── env/ # Environment, sampler, grading, rewards
	├── data/ # TSQuestion schema + JSON/JSONL loaders
	└── server/
	├── __init__.py # Server module exports
	├── app.py # FastAPI application (HTTP + WebSocket endpoints)
	└── Dockerfile # Container image definition
	```