Spaces:

NDGCodes
/

workflow-twin

Sleeping

App Files Files Community

workflow-twin / README.md

NDGCodes

fix repo structure for HF

1a692ce 2 months ago

preview code

raw

history blame contribute delete

6.26 kB

metadata

sdk: docker
app_port: 8000

WorkflowTwin

An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.

This environment simulates multi-step ticket resolution pipelines with:

queueing, prioritization, and dependencies
stochastic arrivals and agent failures
strict memory budgets on agent state

We introduce a quantized memory policy based on:

random orthogonal projection
scalar vector quantization
random projection residual sketching

to study how compression affects agent performance under resource constraints.

Motivation

Real-world agents must operate under limited memory and compute.

Without compression:

state grows unbounded
agents violate system constraints

With quantized memory:

state is compressed
agents remain feasible under tight budgets

This environment enables controlled evaluation of this tradeoff.

Key Results

We evaluate two modes:

baseline: no compression (truncation under pressure)
quant: rotated quantized memory compression

This establishes a clear crossover point where compression transitions from unnecessary to essential.

Memory Budget vs Feasibility

Key Findings

Feasibility threshold shift:
Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.
2× efficiency gain:
Compression halves the memory required for feasible operation.
No-regret behavior:
Under no memory pressure, both methods perform identically.
Constraint robustness:
Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).

Conclusion: Compression extends the feasible operating regime without degrading task performance.

Structure

env/: core environment logic, models, scoring, reward
- includes quantizer.py with rotated vector quantization primitives
server/: FastAPI app exposing reset, step, state
tasks/: JSON task definitions by difficulty
baseline/: non-LLM heuristic policy
baselines/: research evaluation baselines for workflow_twin
inference.py: local rollout entrypoint
openenv.yaml: environment spec

Quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn server.app:app --reload

Server endpoints:

POST /reset
POST /step with body { "action_type": "triage|respond|resolve|escalate", "note": "..." }
GET /state
GET /config (resolved runtime config loaded from env vars)

Run baseline inference:

python inference.py

Inference environment variables:

API_BASE_URL: OpenAI-compatible endpoint base URL
HF_TOKEN: API token (used as api_key)
MODEL_NAME: chat model name (default: gpt-4o-mini)

If API_BASE_URL or HF_TOKEN is missing, inference automatically falls back to heuristic policy.

inference.py result fields:

score: final reported score (env_score when available, otherwise partial_score)
env_score: environment-provided score from env.state()
partial_score: fallback score from normalized accumulated reward
openai_client_configured: true when both API_BASE_URL and HF_TOKEN are present

Method: Quantized Memory Policy

We implement a rotated vector quantization pipeline:

Random Orthogonal Projection
- decorrelates embedding dimensions
Scalar Quantization
- coordinate-wise discretization
Residual Random Projection Sketch
- preserves inner-product structure

Reward shaping includes:

distortion penalty (MSE)
inner-product preservation penalty

Research-Grade WorkflowTwin (L1-L5)

A new package workflow_twin/ is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.

Included

workflow_twin/core/entities.py: multi-ticket state, agents, time, SLA/resource fields
workflow_twin/core/dynamics.py: queue logic, SLA penalties, dependencies, stochastic arrivals/failures
workflow_twin/core/config.py: level configs (L1-L5)
workflow_twin/environment.py: main level-aware environment (WorkflowTwinEnv)
workflow_twin/memory.py: MemoryBoundedEnv wrapper using rotated quantized memory compression
workflow_twin/levels/: level hooks for L1 simple → L5 memory pressure
baselines/heuristics.py: simple queue baseline policy
tasks/level1..level5/: task scaffolding per level

Quick Example

python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from baselines.heuristics import greedy_queue_policy

env = WorkflowTwinEnv(level=3, seed=42)
obs = env.reset()

for _ in range(10):
    action = greedy_queue_policy(obs)
    obs, reward, done, info = env.step(action)
    print(info["step_count"], reward, info["queue"])
    if done:
        break
PY

Memory-Bounded Wrapper Example (L5)

python - <<'PY'
from workflow_twin.environment import WorkflowTwinEnv
from workflow_twin.memory import MemoryBoundedEnv

base_env = WorkflowTwinEnv(level=5, seed=42)
env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
obs = env.reset()
obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
print(info["memory"])
PY

Docker

docker build -t workflowtwin .
docker run -p 8000:8000 workflowtwin

Controlled A/B Quantized Memory Evaluation

Run the controlled experiment suite:

python -m experiments.ab_quantized_memory_eval

This executes two tests with shared metrics:

control_no_memory_pressure (Level 1, large memory budget)
critical_memory_constrained_long_horizon (Level 5, tight memory budget)
memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)

Modes compared:

baseline: no compression, truncation under pressure
quant: rotated quantized memory compression under pressure

Reported metrics:

avg_reward
success_rate (resolved/total)
avg_sla_violations
avg_memory_used vs avg_memory_budget
memory_compliance_rate
steps_per_sec

Figure (generated by the experiment runner):