Spaces:

NDGCodes
/

workflow-twin

Sleeping

App Files Files Community

NDGCodes commited on Apr 5

Commit

1a692ce

1 Parent(s): caf852f

fix repo structure for HF

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
workflowTwin/.env.example → .env.example +0 -0
workflowTwin/.gitignore → .gitignore +0 -0
workflowTwin/Dockerfile → Dockerfile +0 -0
README.md +211 -8
{workflowTwin/baseline → baseline}/policy.py +0 -0
{workflowTwin/baselines → baselines}/heuristics.py +0 -0
{workflowTwin/baselines → baselines}/rl_agents.py +0 -0
{workflowTwin/env → env}/__init__.py +0 -0
{workflowTwin/env → env}/dynamics.py +0 -0
{workflowTwin/env → env}/entities.py +0 -0
{workflowTwin/env → env}/environment.py +0 -0
{workflowTwin/env → env}/graders.py +0 -0
{workflowTwin/env → env}/models.py +0 -0
{workflowTwin/env → env}/quantizer.py +0 -0
{workflowTwin/env → env}/reward.py +0 -0
{workflowTwin/env → env}/runtime_config.py +0 -0
{workflowTwin/env → env}/tasks.py +0 -0
{workflowTwin/experiments → experiments}/ab_quantized_memory_eval.py +0 -0
{workflowTwin/experiments → experiments}/ab_turboquant_eval.py +0 -0
{workflowTwin/experiments → experiments}/figures/memory_budget_vs_compliance.svg +0 -0
workflowTwin/inference.py → inference.py +0 -0
workflowTwin/openenv.yaml → openenv.yaml +0 -0
workflowTwin/requirements.txt → requirements.txt +0 -0
{workflowTwin/server → server}/app.py +0 -0
{workflowTwin/server → server}/routes.py +0 -0
{workflowTwin/tasks → tasks}/easy.json +0 -0
{workflowTwin/tasks → tasks}/hard.json +0 -0
{workflowTwin/tasks → tasks}/level1/tasks.json +0 -0
{workflowTwin/tasks → tasks}/level2/tasks.json +0 -0
{workflowTwin/tasks → tasks}/level3/tasks.json +0 -0
{workflowTwin/tasks → tasks}/level4/tasks.json +0 -0
{workflowTwin/tasks → tasks}/level5/tasks.json +0 -0
{workflowTwin/tasks → tasks}/medium.json +0 -0
workflowTwin/.DS_Store +0 -0
workflowTwin/README.md +0 -215
{workflowTwin/workflow_twin → workflow_twin}/.DS_Store +0 -0
{workflowTwin/workflow_twin → workflow_twin}/__init__.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/core/__init__.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/core/config.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/core/dynamics.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/core/entities.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/environment.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/__init__.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/level1_simple.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/level2_sla.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/level3_approval.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/level4_stochastic.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/levels/level5_memory.py +0 -0
{workflowTwin/workflow_twin → workflow_twin}/memory.py +0 -0

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

workflowTwin/.env.example → .env.example RENAMED Viewed

File without changes

workflowTwin/.gitignore → .gitignore RENAMED Viewed

File without changes

workflowTwin/Dockerfile → Dockerfile RENAMED Viewed

File without changes

README.md CHANGED Viewed

@@ -1,12 +1,215 @@
 ---
-title: Workflow Twin
-emoji: 🌍
-colorFrom: purple
-colorTo: gray
 sdk: docker
-pinned: false
-license: mit
-short_description: OpenEnv environment for workflow simulation under memory con
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 sdk: docker
+app_port: 8000
 ---
+# WorkflowTwin
+An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.
+This environment simulates multi-step ticket resolution pipelines with:
+- queueing, prioritization, and dependencies
+- stochastic arrivals and agent failures
+- strict memory budgets on agent state
+We introduce a **quantized memory policy** based on:
+- random orthogonal projection
+- scalar vector quantization
+- random projection residual sketching
+to study how compression affects agent performance under resource constraints.
+## Motivation
+Real-world agents must operate under limited memory and compute.
+Without compression:
+- state grows unbounded
+- agents violate system constraints
+With quantized memory:
+- state is compressed
+- agents remain feasible under tight budgets
+This environment enables controlled evaluation of this tradeoff.
+## Key Results
+We evaluate two modes:
+- **baseline**: no compression (truncation under pressure)
+- **quant**: rotated quantized memory compression
+This establishes a clear crossover point where compression transitions from unnecessary to essential.
+### Memory Budget vs Feasibility
+![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)
+### Key Findings
+- **Feasibility threshold shift:**
+  Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.
+- **2× efficiency gain:**
+  Compression halves the memory required for feasible operation.
+- **No-regret behavior:**
+  Under no memory pressure, both methods perform identically.
+- **Constraint robustness:**
+  Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).
+**Conclusion:** Compression extends the feasible operating regime without degrading task performance.
+## Structure
+- `env/`: core environment logic, models, scoring, reward
+	- includes `quantizer.py` with rotated vector quantization primitives
+- `server/`: FastAPI app exposing `reset`, `step`, `state`
+- `tasks/`: JSON task definitions by difficulty
+- `baseline/`: non-LLM heuristic policy
+- `baselines/`: research evaluation baselines for `workflow_twin`
+- `inference.py`: local rollout entrypoint
+- `openenv.yaml`: environment spec
+## Quickstart
+```bash
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+uvicorn server.app:app --reload
+```
+Server endpoints:
+- `POST /reset`
+- `POST /step` with body `{ "action_type": "triage|respond|resolve|escalate", "note": "..." }`
+- `GET /state`
+- `GET /config` (resolved runtime config loaded from env vars)
+Run baseline inference:
+```bash
+python inference.py
+```
+Inference environment variables:
+- `API_BASE_URL`: OpenAI-compatible endpoint base URL
+- `HF_TOKEN`: API token (used as `api_key`)
+- `MODEL_NAME`: chat model name (default: `gpt-4o-mini`)
+If `API_BASE_URL` or `HF_TOKEN` is missing, inference automatically falls back to heuristic policy.
+`inference.py` result fields:
+- `score`: final reported score (`env_score` when available, otherwise `partial_score`)
+- `env_score`: environment-provided score from `env.state()`
+- `partial_score`: fallback score from normalized accumulated reward
+- `openai_client_configured`: `true` when both `API_BASE_URL` and `HF_TOKEN` are present
+## Method: Quantized Memory Policy
+We implement a rotated vector quantization pipeline:
+1. **Random Orthogonal Projection**
+   - decorrelates embedding dimensions
+2. **Scalar Quantization**
+   - coordinate-wise discretization
+3. **Residual Random Projection Sketch**
+   - preserves inner-product structure
+Reward shaping includes:
+- distortion penalty (MSE)
+- inner-product preservation penalty
+## Research-Grade WorkflowTwin (L1-L5)
+A new package `workflow_twin/` is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.
+### Included
+- `workflow_twin/core/entities.py`: multi-ticket state, agents, time, SLA/resource fields
+- `workflow_twin/core/dynamics.py`: queue logic, SLA penalties, dependencies, stochastic arrivals/failures
+- `workflow_twin/core/config.py`: level configs (L1-L5)
+- `workflow_twin/environment.py`: main level-aware environment (`WorkflowTwinEnv`)
+- `workflow_twin/memory.py`: `MemoryBoundedEnv` wrapper using rotated quantized memory compression
+- `workflow_twin/levels/`: level hooks for L1 simple → L5 memory pressure
+- `baselines/heuristics.py`: simple queue baseline policy
+- `tasks/level1..level5/`: task scaffolding per level
+### Quick Example
+```bash
+python - <<'PY'
+from workflow_twin.environment import WorkflowTwinEnv
+from baselines.heuristics import greedy_queue_policy
+env = WorkflowTwinEnv(level=3, seed=42)
+obs = env.reset()
+for _ in range(10):
+	action = greedy_queue_policy(obs)
+	obs, reward, done, info = env.step(action)
+	print(info["step_count"], reward, info["queue"])
+	if done:
+		break
+PY
+```
+### Memory-Bounded Wrapper Example (L5)
+```bash
+python - <<'PY'
+from workflow_twin.environment import WorkflowTwinEnv
+from workflow_twin.memory import MemoryBoundedEnv
+base_env = WorkflowTwinEnv(level=5, seed=42)
+env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
+obs = env.reset()
+obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
+print(info["memory"])
+PY
+```
+## Docker
+```bash
+docker build -t workflowtwin .
+docker run -p 8000:8000 workflowtwin
+```
+## Controlled A/B Quantized Memory Evaluation
+Run the controlled experiment suite:
+```bash
+python -m experiments.ab_quantized_memory_eval
+```
+This executes two tests with shared metrics:
+- control_no_memory_pressure (Level 1, large memory budget)
+- critical_memory_constrained_long_horizon (Level 5, tight memory budget)
+- memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)
+Modes compared:
+- baseline: no compression, truncation under pressure
+- quant: rotated quantized memory compression under pressure
+Reported metrics:
+- avg_reward
+- success_rate (resolved/total)
+- avg_sla_violations
+- avg_memory_used vs avg_memory_budget
+- memory_compliance_rate
+- steps_per_sec
+Figure (generated by the experiment runner):
+![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)

{workflowTwin/baseline → baseline}/policy.py RENAMED Viewed

File without changes

{workflowTwin/baselines → baselines}/heuristics.py RENAMED Viewed

File without changes

{workflowTwin/baselines → baselines}/rl_agents.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/__init__.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/dynamics.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/entities.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/environment.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/graders.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/models.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/quantizer.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/reward.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/runtime_config.py RENAMED Viewed

File without changes

{workflowTwin/env → env}/tasks.py RENAMED Viewed

File without changes

{workflowTwin/experiments → experiments}/ab_quantized_memory_eval.py RENAMED Viewed

File without changes

{workflowTwin/experiments → experiments}/ab_turboquant_eval.py RENAMED Viewed

File without changes

{workflowTwin/experiments → experiments}/figures/memory_budget_vs_compliance.svg RENAMED Viewed

File without changes

workflowTwin/inference.py → inference.py RENAMED Viewed

File without changes

workflowTwin/openenv.yaml → openenv.yaml RENAMED Viewed

File without changes

workflowTwin/requirements.txt → requirements.txt RENAMED Viewed

File without changes

{workflowTwin/server → server}/app.py RENAMED Viewed

File without changes

{workflowTwin/server → server}/routes.py RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/easy.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/hard.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/level1/tasks.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/level2/tasks.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/level3/tasks.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/level4/tasks.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/level5/tasks.json RENAMED Viewed

File without changes

{workflowTwin/tasks → tasks}/medium.json RENAMED Viewed

File without changes

workflowTwin/.DS_Store DELETED Viewed

Binary file (10.2 kB)

workflowTwin/README.md DELETED Viewed

@@ -1,215 +0,0 @@
----
-sdk: docker
-app_port: 8000
----
-# WorkflowTwin
-An OpenEnv-compatible environment for training and evaluating agents under memory and resource constraints.
-This environment simulates multi-step ticket resolution pipelines with:
-- queueing, prioritization, and dependencies
-- stochastic arrivals and agent failures
-- strict memory budgets on agent state
-We introduce a **quantized memory policy** based on:
-- random orthogonal projection
-- scalar vector quantization
-- random projection residual sketching
-to study how compression affects agent performance under resource constraints.
-## Motivation
-Real-world agents must operate under limited memory and compute.
-Without compression:
-- state grows unbounded
-- agents violate system constraints
-With quantized memory:
-- state is compressed
-- agents remain feasible under tight budgets
-This environment enables controlled evaluation of this tradeoff.
-## Key Results
-We evaluate two modes:
-- **baseline**: no compression (truncation under pressure)
-- **quant**: rotated quantized memory compression
-This establishes a clear crossover point where compression transitions from unnecessary to essential.
-### Memory Budget vs Feasibility
-![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)
-### Key Findings
-- **Feasibility threshold shift:**
-  Baseline requires ~6000 memory, while quantized memory achieves full compliance at ~3000.
-- **2× efficiency gain:**
-  Compression halves the memory required for feasible operation.
-- **No-regret behavior:**
-  Under no memory pressure, both methods perform identically.
-- **Constraint robustness:**
-  Under tight budgets, baseline fails (0% compliance) while quantized memory remains fully feasible (100%).
-**Conclusion:** Compression extends the feasible operating regime without degrading task performance.
-## Structure
-- `env/`: core environment logic, models, scoring, reward
-	- includes `quantizer.py` with rotated vector quantization primitives
-- `server/`: FastAPI app exposing `reset`, `step`, `state`
-- `tasks/`: JSON task definitions by difficulty
-- `baseline/`: non-LLM heuristic policy
-- `baselines/`: research evaluation baselines for `workflow_twin`
-- `inference.py`: local rollout entrypoint
-- `openenv.yaml`: environment spec
-## Quickstart
-```bash
-python -m venv .venv
-source .venv/bin/activate
-pip install -r requirements.txt
-uvicorn server.app:app --reload
-```
-Server endpoints:
-- `POST /reset`
-- `POST /step` with body `{ "action_type": "triage|respond|resolve|escalate", "note": "..." }`
-- `GET /state`
-- `GET /config` (resolved runtime config loaded from env vars)
-Run baseline inference:
-```bash
-python inference.py
-```
-Inference environment variables:
-- `API_BASE_URL`: OpenAI-compatible endpoint base URL
-- `HF_TOKEN`: API token (used as `api_key`)
-- `MODEL_NAME`: chat model name (default: `gpt-4o-mini`)
-If `API_BASE_URL` or `HF_TOKEN` is missing, inference automatically falls back to heuristic policy.
-`inference.py` result fields:
-- `score`: final reported score (`env_score` when available, otherwise `partial_score`)
-- `env_score`: environment-provided score from `env.state()`
-- `partial_score`: fallback score from normalized accumulated reward
-- `openai_client_configured`: `true` when both `API_BASE_URL` and `HF_TOKEN` are present
-## Method: Quantized Memory Policy
-We implement a rotated vector quantization pipeline:
-1. **Random Orthogonal Projection**
-   - decorrelates embedding dimensions
-2. **Scalar Quantization**
-   - coordinate-wise discretization
-3. **Residual Random Projection Sketch**
-   - preserves inner-product structure
-Reward shaping includes:
-- distortion penalty (MSE)
-- inner-product preservation penalty
-## Research-Grade WorkflowTwin (L1-L5)
-A new package `workflow_twin/` is now implemented to evolve the simulator from single-ticket MVP to multi-ticket workflow research environment.
-### Included
-- `workflow_twin/core/entities.py`: multi-ticket state, agents, time, SLA/resource fields
-- `workflow_twin/core/dynamics.py`: queue logic, SLA penalties, dependencies, stochastic arrivals/failures
-- `workflow_twin/core/config.py`: level configs (L1-L5)
-- `workflow_twin/environment.py`: main level-aware environment (`WorkflowTwinEnv`)
-- `workflow_twin/memory.py`: `MemoryBoundedEnv` wrapper using rotated quantized memory compression
-- `workflow_twin/levels/`: level hooks for L1 simple → L5 memory pressure
-- `baselines/heuristics.py`: simple queue baseline policy
-- `tasks/level1..level5/`: task scaffolding per level
-### Quick Example
-```bash
-python - <<'PY'
-from workflow_twin.environment import WorkflowTwinEnv
-from baselines.heuristics import greedy_queue_policy
-env = WorkflowTwinEnv(level=3, seed=42)
-obs = env.reset()
-for _ in range(10):
-	action = greedy_queue_policy(obs)
-	obs, reward, done, info = env.step(action)
-	print(info["step_count"], reward, info["queue"])
-	if done:
-		break
-PY
-```
-### Memory-Bounded Wrapper Example (L5)
-```bash
-python - <<'PY'
-from workflow_twin.environment import WorkflowTwinEnv
-from workflow_twin.memory import MemoryBoundedEnv
-base_env = WorkflowTwinEnv(level=5, seed=42)
-env = MemoryBoundedEnv(base_env, memory_budget=3500, bits=3)
-obs = env.reset()
-obs, reward, done, info = env.step({"action_type": "triage", "note": "memory-check"})
-print(info["memory"])
-PY
-```
-## Docker
-```bash
-docker build -t workflowtwin .
-docker run -p 8000:8000 workflowtwin
-```
-## Controlled A/B Quantized Memory Evaluation
-Run the controlled experiment suite:
-```bash
-python -m experiments.ab_quantized_memory_eval
-```
-This executes two tests with shared metrics:
-- control_no_memory_pressure (Level 1, large memory budget)
-- critical_memory_constrained_long_horizon (Level 5, tight memory budget)
-- memory_budget_sweep (budgets: 2000, 3000, 4000, 6000)
-Modes compared:
-- baseline: no compression, truncation under pressure
-- quant: rotated quantized memory compression under pressure
-Reported metrics:
-- avg_reward
-- success_rate (resolved/total)
-- avg_sla_violations
-- avg_memory_used vs avg_memory_budget
-- memory_compliance_rate
-- steps_per_sec
-Figure (generated by the experiment runner):
-![Memory Budget vs Compliance Rate](experiments/figures/memory_budget_vs_compliance.svg)

{workflowTwin/workflow_twin → workflow_twin}/.DS_Store RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/__init__.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/core/__init__.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/core/config.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/core/dynamics.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/core/entities.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/environment.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/__init__.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/level1_simple.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/level2_sla.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/level3_approval.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/level4_stochastic.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/levels/level5_memory.py RENAMED Viewed

File without changes

{workflowTwin/workflow_twin → workflow_twin}/memory.py RENAMED Viewed

File without changes