Spaces:
Sleeping
sdk: docker
app_port: 8000
WorkflowTwin
An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.
This environment simulates multi-step ticket resolution pipelines with:
- queueing, prioritization, and dependencies
- stochastic arrivals and agent failures
- strict memory budgets on agent state
We introduce a quantized memory policy based on:
- random orthogonal projection
- scalar vector quantization
- random projection residual sketching
to study how compression affects agent performance under resource constraints.
Motivation
Real-world agents must operate under limited memory and compute.
Without compression:
- state grows unbounded
- agents violate system constraints
With quantized memory:
- state is compressed
- agents remain feasible under tight budgets
This environment enables controlled evaluation of this tradeoff.
Key Results
We evaluate two modes:
- baseline: no compression (truncation under pressure)
- quant: rotated quantized memory compression
This establishes a clear crossover point where compression transitions from unnecessary to essential.
Memory Budget vs Feasibility
Key Findings
Feasibility threshold shift:
Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.2× efficiency gain:
Compression halves the memory required for feasible operation.No-regret behavior:
Under no memory pressure, both methods perform identically.Constraint robustness:
Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).
Conclusion: Compression extends the feasible operating regime without degrading task performance.
Structure
env/: core environment logic, models, scoring, reward- includes
quantizer.pywith rotated vector quantization primitives
- includes
server/: FastAPI app exposingreset,step,statetasks/: JSON task definitions by difficultybaseline/: non-LLM heuristic policybaselines/: research evaluation baselines forworkflow_twininference.py: local rollout entrypointopenenv.yaml: environment spec
Quickstart
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn server.app:app --reload
Server endpoints:
POST /resetPOST /stepwith body{ "action_type": "triage|respond|resolve|escalate", "note": "..." }GET /stateGET /config(resolved runtime config loaded from env vars)
Run baseline inference:
python inference.py
Inference environment variables:
API_BASE_URL: OpenAI-compatible endpoint base URLHF_TOKEN: API token (used asapi_key)MODEL_NAME: chat model name (default:gpt-4o-mini)
If API_BASE_URL or HF_TOKEN is missing, inference automatically falls back to heuristic policy.
inference.py result fields:
score: final reported score (env_scorewhen available, otherwisepartial_score)env_score: environment-provided score fromenv.state()partial_score: fallback score from normalized accumulated rewardopenai_client_configured:truewhen bothAPI_BASE_URLandHF_TOKENare present
Method: Quantized Memory Policy
We implement a rotated vector quantization pipeline:
Random Orthogonal Projection
- decorrelates embedding dimensions
Scalar Quantization
- coordinate-wise discretization
Residual Random Projection Sketch
- preserves inner-product structure
Reward shaping includes:
- distortion penalty (MSE)
- inner-product preservation penalty
Research-Grade WorkflowTwin (L1-L5)
A new package workflow_twin/ is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.
Included
workflow_twin/core/entities.py: multi-ticket state, agents, time, SLA/resource fieldsworkflow_twin/core/dynamics.py: queue logic, SLA penalties, dependencies, stochastic arrivals/failuresworkflow_twin/core/config.py: level configs (L1-L5)workflow_twin/environment.py: main level-aware environment (WorkflowTwinEnv)workflow_twin/memory.py:MemoryBoundedEnvwrapper using rotated quantized memory compressionworkflow_twin/levels/: level hooks for L1 simple → L5 memory pressurebaselines/heuristics.py: simple queue baseline policytasks/level1..level5/: task scaffolding per level
Quick Example
python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from baselines.heuristics import greedy_queue_policy
env = WorkflowTwinEnv(level=3, seed=42)
obs = env.reset()
for _ in range(10):
action = greedy_queue_policy(obs)
obs, reward, done, info = env.step(action)
print(info["step_count"], reward, info["queue"])
if done:
break
PY
Memory-Bounded Wrapper Example (L5)
python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from workflow_twin.memory import MemoryBoundedEnv
base_env = WorkflowTwinEnv(level=5, seed=42)
env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
obs = env.reset()
obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
print(info["memory"])
PY
Docker
docker build -t workflowtwin .
docker run -p 8000:8000 workflowtwin
Controlled A/B Quantized Memory Evaluation
Run the controlled experiment suite:
python -m experiments.ab_quantized_memory_eval
This executes two tests with shared metrics:
- control_no_memory_pressure (Level 1, large memory budget)
- critical_memory_constrained_long_horizon (Level 5, tight memory budget)
- memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)
Modes compared:
- baseline: no compression, truncation under pressure
- quant: rotated quantized memory compression under pressure
Reported metrics:
- avg_reward
- success_rate (resolved/total)
- avg_sla_violations
- avg_memory_used vs avg_memory_budget
- memory_compliance_rate
- steps_per_sec
Figure (generated by the experiment runner):