Spaces:

NDGCodes
/

workflow-twin

Sleeping

App Files Files Community

workflow-twin / README.md

NDGCodes

fix repo structure for HF

1a692ce 2 months ago

preview code

raw

history blame contribute delete

6.26 kB

	---
	sdk: docker
	app_port: 8000
	---

	# WorkflowTwin

	An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.

	This environment simulates multi-step ticket resolution pipelines with:
	- queueing, prioritization, and dependencies
	- stochastic arrivals and agent failures
	- strict memory budgets on agent state

	We introduce a quantized memory policy based on:
	- random orthogonal projection
	- scalar vector quantization
	- random projection residual sketching

	to study how compression affects agent performance under resource constraints.

	## Motivation

	Real-world agents must operate under limited memory and compute.

	Without compression:
	- state grows unbounded
	- agents violate system constraints

	With quantized memory:
	- state is compressed
	- agents remain feasible under tight budgets

	This environment enables controlled evaluation of this tradeoff.

	## Key Results

	We evaluate two modes:
	- baseline: no compression (truncation under pressure)
	- quant: rotated quantized memory compression

	This establishes a clear crossover point where compression transitions from unnecessary to essential.

	### Memory Budget vs Feasibility

	![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)

	### Key Findings

	- Feasibility threshold shift:
	Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.

	- 2× efficiency gain:
	Compression halves the memory required for feasible operation.

	- No-regret behavior:
	Under no memory pressure, both methods perform identically.

	- Constraint robustness:
	Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).

	Conclusion: Compression extends the feasible operating regime without degrading task performance.

	## Structure

	- `env/`: core environment logic, models, scoring, reward
	- includes `quantizer.py` with rotated vector quantization primitives
	- `server/`: FastAPI app exposing `reset`, `step`, `state`
	- `tasks/`: JSON task definitions by difficulty
	- `baseline/`: non-LLM heuristic policy
	- `baselines/`: research evaluation baselines for `workflow_twin`
	- `inference.py`: local rollout entrypoint
	- `openenv.yaml`: environment spec

	## Quickstart

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	uvicorn server.app:app --reload
	```

	Server endpoints:

	- `POST /reset`
	- `POST /step` with body `{ "action_type": "triage\|respond\|resolve\|escalate", "note": "..." }`
	- `GET /state`
	- `GET /config` (resolved runtime config loaded from env vars)

	Run baseline inference:

	```bash
	python inference.py
	```

	Inference environment variables:

	- `API_BASE_URL`: OpenAI-compatible endpoint base URL
	- `HF_TOKEN`: API token (used as `api_key`)
	- `MODEL_NAME`: chat model name (default: `gpt-4o-mini`)

	If `API_BASE_URL` or `HF_TOKEN` is missing, inference automatically falls back to heuristic policy.

	`inference.py` result fields:

	- `score`: final reported score (`env_score` when available, otherwise `partial_score`)
	- `env_score`: environment-provided score from `env.state()`
	- `partial_score`: fallback score from normalized accumulated reward
	- `openai_client_configured`: `true` when both `API_BASE_URL` and `HF_TOKEN` are present

	## Method: Quantized Memory Policy

	We implement a rotated vector quantization pipeline:

	1. Random Orthogonal Projection
	- decorrelates embedding dimensions

	2. Scalar Quantization
	- coordinate-wise discretization

	3. Residual Random Projection Sketch
	- preserves inner-product structure

	Reward shaping includes:
	- distortion penalty (MSE)
	- inner-product preservation penalty

	## Research-Grade WorkflowTwin (L1-L5)

	A new package `workflow_twin/` is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.

	### Included

	- `workflow_twin/core/entities.py`: multi-ticket state, agents, time, SLA/resource fields
	- `workflow_twin/core/dynamics.py`: queue logic, SLA penalties, dependencies, stochastic arrivals/failures
	- `workflow_twin/core/config.py`: level configs (L1-L5)
	- `workflow_twin/environment.py`: main level-aware environment (`WorkflowTwinEnv`)
	- `workflow_twin/memory.py`: `MemoryBoundedEnv` wrapper using rotated quantized memory compression
	- `workflow_twin/levels/`: level hooks for L1 simple → L5 memory pressure
	- `baselines/heuristics.py`: simple queue baseline policy
	- `tasks/level1..level5/`: task scaffolding per level

	### Quick Example

	```bash
	python - <<'PY'
	from workflow_twin.environment import WorkflowTwinEnv
	from baselines.heuristics import greedy_queue_policy

	env = WorkflowTwinEnv(level=3, seed=42)
	obs = env.reset()

	for _ in range(10):
	action = greedy_queue_policy(obs)
	obs, reward, done, info = env.step(action)
	print(info["step_count"], reward, info["queue"])
	if done:
	break
	PY
	```

	### Memory-Bounded Wrapper Example (L5)

	```bash
	python - <<'PY'
	from workflow_twin.environment import WorkflowTwinEnv
	from workflow_twin.memory import MemoryBoundedEnv

	base_env = WorkflowTwinEnv(level=5, seed=42)
	env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
	obs = env.reset()
	obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
	print(info["memory"])
	PY
	```

	## Docker

	```bash
	docker build -t workflowtwin .
	docker run -p 8000:8000 workflowtwin
	```

	## Controlled A/B Quantized Memory Evaluation

	Run the controlled experiment suite:

	```bash
	python -m experiments.ab_quantized_memory_eval
	```

	This executes two tests with shared metrics:

	- control_no_memory_pressure (Level 1, large memory budget)
	- critical_memory_constrained_long_horizon (Level 5, tight memory budget)
	- memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)

	Modes compared:

	- baseline: no compression, truncation under pressure
	- quant: rotated quantized memory compression under pressure

	Reported metrics:

	- avg_reward
	- success_rate (resolved/total)
	- avg_sla_violations
	- avg_memory_used vs avg_memory_budget
	- memory_compliance_rate
	- steps_per_sec

	Figure (generated by the experiment runner):

	![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)