Spaces:
Sleeping
title: TemporalBenchEnv MCQ Server
emoji: π₯
colorFrom: yellow
colorTo: indigo
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
TemporalBenchEnv
OpenEnv environment for multi-step multiple-choice time-series reasoning. Each episode samples nine questions from pre-built JSON banks (per-dataset files or merged JSONL in TSQuestion shape). Rewards combine per-step correctness and an episode bonus (see env/reward.py).
Question bank layout
Point the server at a directory containing PSML_questions.json, freshretailnet_questions.json, MIMIC_questions.json, and causal_chambers_questions.json (each file is a JSON array of TSQuestion records), or set TEMPORALBENCH_QUESTION_BANK_DIR to that path. If unset, the server uses tests/fixtures/banks when present (for local smoke runs).
Each record must include at least: question_id, dataset, task_type (T1U | T3 | T2_MCQ), prompt, options (length β₯ 2), answer, plus optional family, capability_tags, difficulty, metadata.
Quick Start
Use the typed client (TemporalBenchEnvClient; alias TemporalbenchenvEnv):
from client import TemporalBenchAction, TemporalBenchEnvClient
try:
env = TemporalBenchEnvClient.from_docker_image("TemporalBenchEnv-env:latest")
out = env.reset()
while not out.done:
q = out.observation
# Agent picks q.options[i] or equivalent label string
out = env.step(TemporalBenchAction(answer=q.options[0]))
finally:
env.close()
TemporalBenchEnvClient.from_docker_image() handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when you call
close()
Building the Docker Image
Before using the environment, you need to build the Docker image:
# From project root
docker build -t TemporalBenchEnv-env:latest -f server/Dockerfile .
Deploying to Hugging Face Spaces
You can easily deploy your OpenEnv environment to Hugging Face Spaces using the openenv push command:
# From the environment directory (where openenv.yaml is located)
openenv push
# Or specify options
openenv push --namespace my-org --private
The openenv push command will:
- Validate that the directory is an OpenEnv environment (checks for
openenv.yaml) - Prepare a custom build for Hugging Face Docker space (enables web interface)
- Upload to Hugging Face (ensuring you're logged in)
Prerequisites
- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
Options
--directory,-d: Directory containing the OpenEnv environment (defaults to current directory)--repo-id,-r: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)--base-image,-b: Base Docker image to use (overrides Dockerfile FROM)--private: Deploy the space as private (default: public)
Examples
# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
openenv push
# Push to a specific repository
openenv push --repo-id my-org/my-env
# Push with a custom base image
openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
# Push as a private space
openenv push --private
# Combine options
openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
After deployment, your space will be available at:
https://huggingface.co/spaces/<repo-id>
The deployed space includes:
- Web Interface at
/web- Interactive UI for exploring the environment - API Documentation at
/docs- Full OpenAPI/Swagger interface - Health Check at
/health- Container health monitoring - WebSocket at
/ws- Persistent session endpoint for low-latency interactions
Environment Details
Action (TemporalBenchAction)
answer(str) β MCQ label (must match ground truth after optional normalization)confidence,reasoningβ optional
Observation (TemporalBenchObservation)
question,options,task_type,dataset,history,accuracy_so_farstep_idx,steps_remaining,max_steps,done,reward,metadata
Reward
- Per step:
alpha * correctness(correctness 0 or 1). - On the final step, adds episode bonus:
lambda_ep * (total_correct / N) * coverage_multiplier(1.0 if every dataset in the episode has at least one correct answer, else 0.8).
Advanced Usage
Connecting to an Existing Server
If you already have a TemporalBenchEnv server running, connect with:
from client import TemporalBenchAction, TemporalBenchEnvClient
with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
r = env.reset()
r = env.step(TemporalBenchAction(answer=r.observation.options[0]))
Note: close() does not stop a remote server you attached to with base_url=....
Using the Context Manager
The client supports context manager usage for automatic connection management:
from client import TemporalBenchAction, TemporalBenchEnvClient
with TemporalBenchEnvClient(base_url="http://localhost:8000") as env:
result = env.reset()
while not result.done:
ans = result.observation.options[0]
result = env.step(TemporalBenchAction(answer=ans))
The client uses WebSocket connections for:
- Lower latency: No HTTP connection overhead per request
- Persistent session: Server maintains your environment state
- Efficient for episodes: Better for many sequential steps
Concurrent WebSocket Sessions
The server uses factory mode (create_app(_env_factory, ...)) so each WebSocket session gets a fresh TemporalBenchEnvironment. Tune max_concurrent_envs in server/app.py as needed.
Development & Testing
Direct environment testing
uv sync --extra dev
uv run pytest tests/
Running Locally
Run the server locally for development:
uvicorn server.app:app --reload
Project Structure
TemporalBenchEnv/
βββ .dockerignore # Docker build exclusions
βββ __init__.py # Module exports
βββ README.md # This file
βββ openenv.yaml # OpenEnv manifest
βββ pyproject.toml # Project metadata and dependencies
βββ uv.lock # Locked dependencies (generated)
βββ client.py # TemporalBenchEnvClient (alias TemporalbenchenvEnv)
βββ models.py # Action / observation / state re-exports
βββ env/ # Environment, sampler, grading, rewards
βββ data/ # TSQuestion schema + JSON/JSONL loaders
βββ server/
βββ __init__.py # Server module exports
βββ app.py # FastAPI application (HTTP + WebSocket endpoints)
βββ Dockerfile # Container image definition