workflow-twin / README.md
NDGCodes's picture
fix repo structure for HF
1a692ce
metadata
sdk: docker
app_port: 8000

WorkflowTwin

An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.

This environment simulates multi-step ticket resolution pipelines with:

  • queueing, prioritization, and dependencies
  • stochastic arrivals and agent failures
  • strict memory budgets on agent state

We introduce a quantized memory policy based on:

  • random orthogonal projection
  • scalar vector quantization
  • random projection residual sketching

to study how compression affects agent performance under resource constraints.

Motivation

Real-world agents must operate under limited memory and compute.

Without compression:

  • state grows unbounded
  • agents violate system constraints

With quantized memory:

  • state is compressed
  • agents remain feasible under tight budgets

This environment enables controlled evaluation of this tradeoff.

Key Results

We evaluate two modes:

  • baseline: no compression (truncation under pressure)
  • quant: rotated quantized memory compression

This establishes a clear crossover point where compression transitions from unnecessary to essential.

Memory Budget vs Feasibility

Memory Budget vs Compliance Rate

Key Findings

  • Feasibility threshold shift:
    Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.

  • 2× efficiency gain:
    Compression halves the memory required for feasible operation.

  • No-regret behavior:
    Under no memory pressure, both methods perform identically.

  • Constraint robustness:
    Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).

Conclusion: Compression extends the feasible operating regime without degrading task performance.

Structure

  • env/: core environment logic, models, scoring, reward
    • includes quantizer.py with rotated vector quantization primitives
  • server/: FastAPI app exposing reset, step, state
  • tasks/: JSON task definitions by difficulty
  • baseline/: non-LLM heuristic policy
  • baselines/: research evaluation baselines for workflow_twin
  • inference.py: local rollout entrypoint
  • openenv.yaml: environment spec

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn server.app:app --reload

Server endpoints:

  • POST /reset
  • POST /step with body { "action_type": "triage|respond|resolve|escalate", "note": "..." }
  • GET /state
  • GET /config (resolved runtime config loaded from env vars)

Run baseline inference:

python inference.py

Inference environment variables:

  • API_BASE_URL: OpenAI-compatible endpoint base URL
  • HF_TOKEN: API token (used as api_key)
  • MODEL_NAME: chat model name (default: gpt-4o-mini)

If API_BASE_URL or HF_TOKEN is missing, inference automatically falls back to heuristic policy.

inference.py result fields:

  • score: final reported score (env_score when available, otherwise partial_score)
  • env_score: environment-provided score from env.state()
  • partial_score: fallback score from normalized accumulated reward
  • openai_client_configured: true when both API_BASE_URL and HF_TOKEN are present

Method: Quantized Memory Policy

We implement a rotated vector quantization pipeline:

  1. Random Orthogonal Projection

    • decorrelates embedding dimensions
  2. Scalar Quantization

    • coordinate-wise discretization
  3. Residual Random Projection Sketch

    • preserves inner-product structure

Reward shaping includes:

  • distortion penalty (MSE)
  • inner-product preservation penalty

Research-Grade WorkflowTwin (L1-L5)

A new package workflow_twin/ is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.

Included

  • workflow_twin/core/entities.py: multi-ticket state, agents, time, SLA/resource fields
  • workflow_twin/core/dynamics.py: queue logic, SLA penalties, dependencies, stochastic arrivals/failures
  • workflow_twin/core/config.py: level configs (L1-L5)
  • workflow_twin/environment.py: main level-aware environment (WorkflowTwinEnv)
  • workflow_twin/memory.py: MemoryBoundedEnv wrapper using rotated quantized memory compression
  • workflow_twin/levels/: level hooks for L1 simple → L5 memory pressure
  • baselines/heuristics.py: simple queue baseline policy
  • tasks/level1..level5/: task scaffolding per level

Quick Example

python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from baselines.heuristics import greedy_queue_policy

env = WorkflowTwinEnv(level=3, seed=42)
obs = env.reset()

for _ in range(10):
    action = greedy_queue_policy(obs)
    obs, reward, done, info = env.step(action)
    print(info["step_count"], reward, info["queue"])
    if done:
        break
PY

Memory-Bounded Wrapper Example (L5)

python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from workflow_twin.memory import MemoryBoundedEnv

base_env = WorkflowTwinEnv(level=5, seed=42)
env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
obs = env.reset()
obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
print(info["memory"])
PY

Docker

docker build -t workflowtwin .
docker run -p 8000:8000 workflowtwin

Controlled A/B Quantized Memory Evaluation

Run the controlled experiment suite:

python -m experiments.ab_quantized_memory_eval

This executes two tests with shared metrics:

  • control_no_memory_pressure (Level 1, large memory budget)
  • critical_memory_constrained_long_horizon (Level 5, tight memory budget)
  • memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)

Modes compared:

  • baseline: no compression, truncation under pressure
  • quant: rotated quantized memory compression under pressure

Reported metrics:

  • avg_reward
  • success_rate (resolved/total)
  • avg_sla_violations
  • avg_memory_used vs avg_memory_budget
  • memory_compliance_rate
  • steps_per_sec

Figure (generated by the experiment runner):

Memory Budget vs Compliance Rate