--- sdk: docker app_port: 8000 --- # WorkflowTwin An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints. This environment simulates multi-step ticket resolution pipelines with: - queueing, prioritization, and dependencies - stochastic arrivals and agent failures - strict memory budgets on agent state We introduce a **quantized memory policy** based on: - random orthogonal projection - scalar vector quantization - random projection residual sketching to study how compression affects agent performance under resource constraints. ## Motivation Real-world agents must operate under limited memory and compute. Without compression: - state grows unbounded - agents violate system constraints With quantized memory: - state is compressed - agents remain feasible under tight budgets This environment enables controlled evaluation of this tradeoff. ## Key Results We evaluate two modes: - **baseline**: no compression (truncation under pressure) - **quant**: rotated quantized memory compression This establishes a clear crossover point where compression transitions from unnecessary to essential. ### Memory Budget vs Feasibility ![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg) ### Key Findings - **Feasibility threshold shift:** Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000. - **2× efficiency gain:** Compression halves the memory required for feasible operation. - **No-regret behavior:** Under no memory pressure, both methods perform identically. - **Constraint robustness:** Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%). **Conclusion:** Compression extends the feasible operating regime without degrading task performance. ## Structure - `env/`: core environment logic, models, scoring, reward - includes `quantizer.py` with rotated vector quantization primitives - `server/`: FastAPI app exposing `reset`, `step`, `state` - `tasks/`: JSON task definitions by difficulty - `baseline/`: non-LLM heuristic policy - `baselines/`: research evaluation baselines for `workflow_twin` - `inference.py`: local rollout entrypoint - `openenv.yaml`: environment spec ## Quickstart ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt uvicorn server.app:app --reload ``` Server endpoints: - `POST /reset` - `POST /step` with body `{ "action_type": "triage|respond|resolve|escalate", "note": "..." }` - `GET /state` - `GET /config` (resolved runtime config loaded from env vars) Run baseline inference: ```bash python inference.py ``` Inference environment variables: - `API_BASE_URL`: OpenAI-compatible endpoint base URL - `HF_TOKEN`: API token (used as `api_key`) - `MODEL_NAME`: chat model name (default: `gpt-4o-mini`) If `API_BASE_URL` or `HF_TOKEN` is missing, inference automatically falls back to heuristic policy. `inference.py` result fields: - `score`: final reported score (`env_score` when available, otherwise `partial_score`) - `env_score`: environment-provided score from `env.state()` - `partial_score`: fallback score from normalized accumulated reward - `openai_client_configured`: `true` when both `API_BASE_URL` and `HF_TOKEN` are present ## Method: Quantized Memory Policy We implement a rotated vector quantization pipeline: 1. **Random Orthogonal Projection** - decorrelates embedding dimensions 2. **Scalar Quantization** - coordinate-wise discretization 3. **Residual Random Projection Sketch** - preserves inner-product structure Reward shaping includes: - distortion penalty (MSE) - inner-product preservation penalty ## Research-Grade WorkflowTwin (L1-L5) A new package `workflow_twin/` is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment. ### Included - `workflow_twin/core/entities.py`: multi-ticket state, agents, time, SLA/resource fields - `workflow_twin/core/dynamics.py`: queue logic, SLA penalties, dependencies, stochastic arrivals/failures - `workflow_twin/core/config.py`: level configs (L1-L5) - `workflow_twin/environment.py`: main level-aware environment (`WorkflowTwinEnv`) - `workflow_twin/memory.py`: `MemoryBoundedEnv` wrapper using rotated quantized memory compression - `workflow_twin/levels/`: level hooks for L1 simple → L5 memory pressure - `baselines/heuristics.py`: simple queue baseline policy - `tasks/level1..level5/`: task scaffolding per level ### Quick Example ```bash python - <<'PY' from workflow_twin.environment import WorkflowTwinEnv from baselines.heuristics import greedy_queue_policy env = WorkflowTwinEnv(level=3, seed=42) obs = env.reset() for _ in range(10): action = greedy_queue_policy(obs) obs, reward, done, info = env.step(action) print(info["step_count"], reward, info["queue"]) if done: break PY ``` ### Memory-Bounded Wrapper Example (L5) ```bash python - <<'PY' from workflow_twin.environment import WorkflowTwinEnv from workflow_twin.memory import MemoryBoundedEnv base_env = WorkflowTwinEnv(level=5, seed=42) env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3) obs = env.reset() obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"}) print(info["memory"]) PY ``` ## Docker ```bash docker build -t workflowtwin . docker run -p 8000:8000 workflowtwin ``` ## Controlled A/B Quantized Memory Evaluation Run the controlled experiment suite: ```bash python -m experiments.ab_quantized_memory_eval ``` This executes two tests with shared metrics: - control_no_memory_pressure (Level 1, large memory budget) - critical_memory_constrained_long_horizon (Level 5, tight memory budget) - memory_budget_sweep (budgets: 2000, 3000, 4000, 6000) Modes compared: - baseline: no compression, truncation under pressure - quant: rotated quantized memory compression under pressure Reported metrics: - avg_reward - success_rate (resolved/total) - avg_sla_violations - avg_memory_used vs avg_memory_budget - memory_compliance_rate - steps_per_sec Figure (generated by the experiment runner): ![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)