Spaces:
Sleeping
Sleeping
| sdk: docker | |
| app_port: 8000 | |
| # WorkflowTwin | |
| An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints. | |
| This environment simulates multi-step ticket resolution pipelines with: | |
| - queueing, prioritization, and dependencies | |
| - stochastic arrivals and agent failures | |
| - strict memory budgets on agent state | |
| We introduce a **quantized memory policy** based on: | |
| - random orthogonal projection | |
| - scalar vector quantization | |
| - random projection residual sketching | |
| to study how compression affects agent performance under resource constraints. | |
| ## Motivation | |
| Real-world agents must operate under limited memory and compute. | |
| Without compression: | |
| - state grows unbounded | |
| - agents violate system constraints | |
| With quantized memory: | |
| - state is compressed | |
| - agents remain feasible under tight budgets | |
| This environment enables controlled evaluation of this tradeoff. | |
| ## Key Results | |
| We evaluate two modes: | |
| - **baseline**: no compression (truncation under pressure) | |
| - **quant**: rotated quantized memory compression | |
| This establishes a clear crossover point where compression transitions from unnecessary to essential. | |
| ### Memory Budget vs Feasibility | |
|  | |
| ### Key Findings | |
| - **Feasibility threshold shift:** | |
| Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000. | |
| - **2× efficiency gain:** | |
| Compression halves the memory required for feasible operation. | |
| - **No-regret behavior:** | |
| Under no memory pressure, both methods perform identically. | |
| - **Constraint robustness:** | |
| Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%). | |
| **Conclusion:** Compression extends the feasible operating regime without degrading task performance. | |
| ## Structure | |
| - `env/`: core environment logic, models, scoring, reward | |
| - includes `quantizer.py` with rotated vector quantization primitives | |
| - `server/`: FastAPI app exposing `reset`, `step`, `state` | |
| - `tasks/`: JSON task definitions by difficulty | |
| - `baseline/`: non-LLM heuristic policy | |
| - `baselines/`: research evaluation baselines for `workflow_twin` | |
| - `inference.py`: local rollout entrypoint | |
| - `openenv.yaml`: environment spec | |
| ## Quickstart | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| uvicorn server.app:app --reload | |
| ``` | |
| Server endpoints: | |
| - `POST /reset` | |
| - `POST /step` with body `{ "action_type": "triage|respond|resolve|escalate", "note": "..." }` | |
| - `GET /state` | |
| - `GET /config` (resolved runtime config loaded from env vars) | |
| Run baseline inference: | |
| ```bash | |
| python inference.py | |
| ``` | |
| Inference environment variables: | |
| - `API_BASE_URL`: OpenAI-compatible endpoint base URL | |
| - `HF_TOKEN`: API token (used as `api_key`) | |
| - `MODEL_NAME`: chat model name (default: `gpt-4o-mini`) | |
| If `API_BASE_URL` or `HF_TOKEN` is missing, inference automatically falls back to heuristic policy. | |
| `inference.py` result fields: | |
| - `score`: final reported score (`env_score` when available, otherwise `partial_score`) | |
| - `env_score`: environment-provided score from `env.state()` | |
| - `partial_score`: fallback score from normalized accumulated reward | |
| - `openai_client_configured`: `true` when both `API_BASE_URL` and `HF_TOKEN` are present | |
| ## Method: Quantized Memory Policy | |
| We implement a rotated vector quantization pipeline: | |
| 1. **Random Orthogonal Projection** | |
| - decorrelates embedding dimensions | |
| 2. **Scalar Quantization** | |
| - coordinate-wise discretization | |
| 3. **Residual Random Projection Sketch** | |
| - preserves inner-product structure | |
| Reward shaping includes: | |
| - distortion penalty (MSE) | |
| - inner-product preservation penalty | |
| ## Research-Grade WorkflowTwin (L1-L5) | |
| A new package `workflow_twin/` is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment. | |
| ### Included | |
| - `workflow_twin/core/entities.py`: multi-ticket state, agents, time, SLA/resource fields | |
| - `workflow_twin/core/dynamics.py`: queue logic, SLA penalties, dependencies, stochastic arrivals/failures | |
| - `workflow_twin/core/config.py`: level configs (L1-L5) | |
| - `workflow_twin/environment.py`: main level-aware environment (`WorkflowTwinEnv`) | |
| - `workflow_twin/memory.py`: `MemoryBoundedEnv` wrapper using rotated quantized memory compression | |
| - `workflow_twin/levels/`: level hooks for L1 simple → L5 memory pressure | |
| - `baselines/heuristics.py`: simple queue baseline policy | |
| - `tasks/level1..level5/`: task scaffolding per level | |
| ### Quick Example | |
| ```bash | |
| python - <<'PY' | |
| from workflow_twin.environment import WorkflowTwinEnv | |
| from baselines.heuristics import greedy_queue_policy | |
| env = WorkflowTwinEnv(level=3, seed=42) | |
| obs = env.reset() | |
| for _ in range(10): | |
| action = greedy_queue_policy(obs) | |
| obs, reward, done, info = env.step(action) | |
| print(info["step_count"], reward, info["queue"]) | |
| if done: | |
| break | |
| PY | |
| ``` | |
| ### Memory-Bounded Wrapper Example (L5) | |
| ```bash | |
| python - <<'PY' | |
| from workflow_twin.environment import WorkflowTwinEnv | |
| from workflow_twin.memory import MemoryBoundedEnv | |
| base_env = WorkflowTwinEnv(level=5, seed=42) | |
| env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3) | |
| obs = env.reset() | |
| obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"}) | |
| print(info["memory"]) | |
| PY | |
| ``` | |
| ## Docker | |
| ```bash | |
| docker build -t workflowtwin . | |
| docker run -p 8000:8000 workflowtwin | |
| ``` | |
| ## Controlled A/B Quantized Memory Evaluation | |
| Run the controlled experiment suite: | |
| ```bash | |
| python -m experiments.ab_quantized_memory_eval | |
| ``` | |
| This executes two tests with shared metrics: | |
| - control_no_memory_pressure (Level 1, large memory budget) | |
| - critical_memory_constrained_long_horizon (Level 5, tight memory budget) | |
| - memory_budget_sweep (budgets: 2000, 3000, 4000, 6000) | |
| Modes compared: | |
| - baseline: no compression, truncation under pressure | |
| - quant: rotated quantized memory compression under pressure | |
| Reported metrics: | |
| - avg_reward | |
| - success_rate (resolved/total) | |
| - avg_sla_violations | |
| - avg_memory_used vs avg_memory_budget | |
| - memory_compliance_rate | |
| - steps_per_sec | |
| Figure (generated by the experiment runner): | |
|  | |