nitinsaini08 commited on
Commit
d7ced7d
·
verified ·
1 Parent(s): 5d96982

Upload folder using huggingface_hub

Browse files
.gitignore ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ venv/
2
+ .venv/
3
+ env/
4
+ __pycache__/
5
+ *.pyc
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HarFeast OpenEnv - HF Spaces / Docker deployment
2
+ # Build: docker build -t harfeast-env .
3
+
4
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
5
+ FROM ${BASE_IMAGE}
6
+
7
+ WORKDIR /app
8
+
9
+ # Copy project files
10
+ COPY harfeast_env /app/harfeast_env
11
+ COPY harfeast_openenv /app/harfeast_openenv
12
+ COPY harfeast_world /app/harfeast_world
13
+ COPY harfeast_synthetic_world_generator.py /app/
14
+
15
+ # Generate world if missing (e.g. harfeast_world not committed)
16
+ RUN python /app/harfeast_synthetic_world_generator.py --output-dir /app/harfeast_world 2>/dev/null || true
17
+
18
+ # Optional: generate augmented dataset (200+ task variations) for RL training
19
+ # Uncomment to enable HARFEAST_WORLDS_BASE:
20
+ # RUN python /app/harfeast_synthetic_world_generator.py --batch 40 --output-dir /app/harfeast_worlds
21
+
22
+ # Install dependencies
23
+ RUN pip install --no-cache-dir openenv-core>=0.2.1 fastapi uvicorn
24
+
25
+ ENV HARFEAST_WORLD_PATH=/app/harfeast_world
26
+ # ENV HARFEAST_WORLDS_BASE=/app/harfeast_worlds
27
+ ENV PYTHONPATH=/app
28
+ ENV ENABLE_WEB_INTERFACE=true
29
+
30
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
31
+ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
32
+
33
+ EXPOSE 8000
34
+ CMD ["uvicorn", "harfeast_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: HarFeast Env
3
+ emoji: "\U0001F33E"
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 8000
8
+ base_path: /web
9
+ pinned: false
10
+ tags:
11
+ - openenv
12
+ ---
13
+
14
+ # HarFeast OpenEnv Environment
15
+
16
+ RL training environment for management consulting tasks, built for the [OpenEnv Hackathon](https://github.com/meta-pytorch/OpenEnv) (Mercor APEX-Agents sub-theme).
17
+
18
+ An LLM agent navigates files, spreadsheets, and data tools to solve 14 multi-step analytical tasks about a fictional food manufacturing company. Answers are scored against deterministic rubrics.
19
+
20
+ ## Actions
21
+
22
+ | Action | Description |
23
+ |--------|-------------|
24
+ | `files.list` | List files/directories |
25
+ | `files.read` | Read text documents |
26
+ | `spreadsheet.read_range` | Read CSV rows/columns |
27
+ | `data.filter` | Filter rows by condition |
28
+ | `data.group_by` | Group + aggregate |
29
+ | `data.add_columns` | Derived columns |
30
+ | `data.compute` | Math expression eval |
31
+ | `submit` | Submit final answer (scored against rubric) |
32
+
33
+ ## Links
34
+
35
+ - [APEX-Agents Dataset](https://huggingface.co/datasets/mercor/apex-agents)
36
+ - [Archipelago Eval](https://github.com/Mercor-Intelligence/archipelago)
37
+ - [APEX-Agents Paper](https://arxiv.org/abs/2601.14242)
harfeast_env/README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HarFeast Environment
2
+
3
+ Management consulting RL environment for OpenEnv. Agents explore CSV data, text documents, run filters/aggregations, and submit answers scored by rubric.
4
+
5
+ ## Actions (8)
6
+
7
+ - **files.list(path)** - List files in data/ or documents/
8
+ - **files.read(path)** - Read text documents
9
+ - **spreadsheet.read_range(file, range)** - Read CSV (columns, 1:10, all)
10
+ - **data.filter(dataset, column, operator, value)** - Filter rows
11
+ - **data.group_by(dataset, column, aggregation, target_column)** - Aggregate
12
+ - **data.add_columns(dataset, new_column, expression)** - Derived columns
13
+ - **data.compute(expression)** - Math calculator
14
+ - **submit(answer)** - Submit final answer; episode ends; rubric scores 0-100
15
+
16
+ ## Action format (JSON)
17
+
18
+ ```json
19
+ {"action": "files.list", "path": "."}
20
+ {"action": "data.filter", "dataset": "employee_survey.csv", "column": "training_received", "operator": "eq", "value": "Yes"}
21
+ {"action": "submit", "answer": "The count is 1202. Excellent: 14%, Good: 41%..."}
22
+ ```
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from harfeast_env import HarFeastEnv, HarFeastAction
28
+ import json
29
+
30
+ # Connect to HF Space
31
+ client = HarFeastEnv(base_url="https://YOUR-USERNAME-harfeast-env.hf.space")
32
+
33
+ # Reset (load task)
34
+ result = client.reset()
35
+ print(result.observation.observation) # Task prompt
36
+
37
+ # Step - send action as JSON string
38
+ action = HarFeastAction(action_json=json.dumps({"action": "files.list", "path": "."}))
39
+ result = client.step(action)
40
+ print(result.observation.observation)
41
+ print(result.reward, result.done)
42
+
43
+ client.close()
44
+ ```
45
+
46
+ ## Local run
47
+
48
+ ```bash
49
+ cd /path/to/harfeast_apex_openenv_hackathon
50
+ python -m uvicorn harfeast_env.server.app:app --host 0.0.0.0 --port 8000
51
+ ```
52
+
53
+ Then: `HarFeastEnv(base_url="http://localhost:8000")`
harfeast_env/__init__.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HarFeast OpenEnv - Management consulting RL environment.
3
+ Compatible with OpenEnv 0.2.1 for HF Spaces deployment.
4
+ """
5
+
6
+ from harfeast_env.models import HarFeastAction, HarFeastObservation
7
+ from harfeast_env.client import HarFeastEnv
8
+
9
+ __all__ = ["HarFeastAction", "HarFeastObservation", "HarFeastEnv"]
harfeast_env/client.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HarFeast Environment Client.
3
+ Connects to HarFeast OpenEnv server via WebSocket/HTTP.
4
+ """
5
+
6
+ from typing import Any, Dict
7
+
8
+ from openenv.core.client_types import StepResult
9
+ from openenv.core.env_server.types import State
10
+ from openenv.core.env_client import EnvClient
11
+ from harfeast_env.models import HarFeastAction, HarFeastObservation
12
+
13
+
14
+ class HarFeastEnv(EnvClient[HarFeastAction, HarFeastObservation, State]):
15
+ """
16
+ Client for the HarFeast management consulting environment.
17
+ """
18
+
19
+ def _step_payload(self, action: HarFeastAction) -> Dict[str, Any]:
20
+ """Convert HarFeastAction to JSON payload."""
21
+ return {"action_json": action.action_json}
22
+
23
+ def _parse_result(self, payload: Dict) -> StepResult[HarFeastObservation]:
24
+ """Parse server response into StepResult."""
25
+ obs_data = payload.get("observation", {})
26
+ observation = HarFeastObservation(
27
+ observation=obs_data.get("observation", ""),
28
+ prompt=obs_data.get("prompt", ""),
29
+ step_count=obs_data.get("step_count", 0),
30
+ datasets_available=obs_data.get("datasets_available", "[]"),
31
+ done=payload.get("done", False),
32
+ reward=payload.get("reward"),
33
+ metadata=obs_data.get("metadata", {}),
34
+ )
35
+ return StepResult(
36
+ observation=observation,
37
+ reward=payload.get("reward"),
38
+ done=payload.get("done", False),
39
+ )
40
+
41
+ def _parse_state(self, payload: Dict) -> State:
42
+ """Parse state from server."""
43
+ return State(
44
+ episode_id=payload.get("episode_id"),
45
+ step_count=payload.get("step_count", 0),
46
+ )
harfeast_env/models.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Data models for the HarFeast Environment.
3
+ Actions are JSON-serialized calls: {"action": "files.list", "path": "."}
4
+ """
5
+
6
+ from pydantic import Field
7
+
8
+ from openenv.core.env_server.types import Action, Observation
9
+
10
+
11
+ class HarFeastAction(Action):
12
+ """
13
+ Action for HarFeast - JSON string encoding the action call.
14
+ Example: '{"action": "files.list", "path": "."}'
15
+ """
16
+
17
+ action_json: str = Field(
18
+ ...,
19
+ min_length=2,
20
+ description="JSON action: {\"action\": \"<name>\", ...params}. "
21
+ "Actions: files.list, files.read, spreadsheet.read_range, "
22
+ "data.filter, data.group_by, data.add_columns, data.compute, submit",
23
+ )
24
+
25
+
26
+ class HarFeastObservation(Observation):
27
+ """Observation from HarFeast - text result + metadata."""
28
+
29
+ observation: str = Field(
30
+ ...,
31
+ description="Text output from the action (file list, table, confirmation, etc.)",
32
+ )
33
+ prompt: str = Field(
34
+ default="",
35
+ description="Current task prompt",
36
+ )
37
+ step_count: int = Field(
38
+ default=0,
39
+ ge=0,
40
+ description="Number of steps taken",
41
+ )
42
+ datasets_available: str = Field(
43
+ default="[]",
44
+ description="JSON list of filtered dataset names available for chaining",
45
+ )
46
+ done: bool = Field(
47
+ default=False,
48
+ description="Whether the episode has ended",
49
+ )
50
+ reward: float = Field(
51
+ default=0.0,
52
+ description="Rubric score (0-100) when done, else 0",
53
+ )
54
+ metadata: dict = Field(
55
+ default_factory=dict,
56
+ description="Extra info (action_taken, last_error, task_id)",
57
+ )
harfeast_env/openenv.yaml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: harfeast_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: harfeast_env.server.app:app
6
+ port: 8000
harfeast_env/pyproject.toml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "harfeast-env"
3
+ version = "0.1.0"
4
+ description = "HarFeast management consulting RL environment - OpenEnv compatible"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "openenv-core>=0.2.1",
9
+ ]
10
+
11
+ [project.optional-dependencies]
12
+ dev = [
13
+ "pytest>=7.0",
14
+ ]
15
+
16
+ [build-system]
17
+ requires = ["hatchling"]
18
+ build-backend = "hatchling.build"
19
+
20
+ [tool.hatch.build.targets.wheel]
21
+ packages = ["harfeast_env", "harfeast_openenv"]
harfeast_env/server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # HarFeast server module
harfeast_env/server/app.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI application for HarFeast Environment.
3
+ Exposes the environment over HTTP/WebSocket for OpenEnv clients.
4
+ """
5
+
6
+ import os
7
+ import sys
8
+
9
+ _project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
10
+ if _project_root not in sys.path:
11
+ sys.path.insert(0, _project_root)
12
+
13
+ from openenv.core.env_server.http_server import create_app
14
+ from harfeast_env.models import HarFeastAction, HarFeastObservation
15
+ from harfeast_env.server.harfeast_environment import HarFeastEnvironment
16
+
17
+ WORLD_PATH = os.environ.get("HARFEAST_WORLD_PATH") or os.path.join(_project_root, "harfeast_world")
18
+ WORLDS_BASE = os.environ.get("HARFEAST_WORLDS_BASE")
19
+
20
+
21
+ def _env_factory():
22
+ return HarFeastEnvironment(world_path=WORLD_PATH, worlds_base=WORLDS_BASE)
23
+
24
+
25
+ app = create_app(
26
+ _env_factory,
27
+ HarFeastAction,
28
+ HarFeastObservation,
29
+ env_name="harfeast_env",
30
+ )
31
+
32
+
33
+ @app.get("/")
34
+ def root():
35
+ return {
36
+ "name": "HarFeast OpenEnv",
37
+ "description": "Management consulting RL environment with 14 APEX-style analytical tasks",
38
+ "version": "0.1.0",
39
+ "tasks": 14,
40
+ "tools": [
41
+ "files.list", "files.read", "spreadsheet.read_range",
42
+ "data.filter", "data.group_by", "data.add_columns",
43
+ "data.compute", "submit",
44
+ ],
45
+ "endpoints": {
46
+ "info": "/info",
47
+ "reset": "/reset",
48
+ "step": "/step",
49
+ "health": "/health",
50
+ },
51
+ }
52
+
53
+
54
+ @app.get("/health")
55
+ def health():
56
+ return {"status": "ok"}
57
+
58
+
59
+ def main():
60
+ import uvicorn
61
+ uvicorn.run(app, host="0.0.0.0", port=8000)
62
+
63
+
64
+ if __name__ == "__main__":
65
+ main()
harfeast_env/server/harfeast_environment.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HarFeast Environment - OpenEnv server implementation.
3
+ Management consulting tasks with file, spreadsheet, and data actions.
4
+ """
5
+
6
+ import json
7
+ import os
8
+ from uuid import uuid4
9
+
10
+ from openenv.core.env_server.interfaces import Environment
11
+ from openenv.core.env_server.types import State
12
+
13
+ # Import our core logic - use path relative to project root
14
+ import sys
15
+ _project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
16
+ if _project_root not in sys.path:
17
+ sys.path.insert(0, _project_root)
18
+
19
+ from harfeast_openenv.environment import HarFeastOpenEnv
20
+ from harfeast_openenv.schemas import StepResult
21
+ from harfeast_env.models import HarFeastAction, HarFeastObservation
22
+
23
+
24
+ class HarFeastEnvironment(Environment[HarFeastAction, HarFeastObservation, State]):
25
+ """
26
+ OpenEnv wrapper for HarFeast management consulting environment.
27
+ Supports files.list, files.read, spreadsheet.read_range, data actions, submit.
28
+ """
29
+
30
+ SUPPORTS_CONCURRENT_SESSIONS: bool = False # Session state (filtered datasets)
31
+
32
+ def __init__(self, world_path: str | None = None, worlds_base: str | None = None):
33
+ self._world_path = world_path or os.path.join(_project_root, "harfeast_world")
34
+ self._worlds_base = (worlds_base or os.environ.get("HARFEAST_WORLDS_BASE") or "").strip() or None
35
+ self._env = HarFeastOpenEnv(
36
+ world_path=self._world_path,
37
+ worlds_base=os.path.abspath(self._worlds_base) if self._worlds_base else None,
38
+ )
39
+ self._state = State(episode_id=str(uuid4()), step_count=0)
40
+
41
+ def reset(
42
+ self,
43
+ seed: int | None = None,
44
+ episode_id: str | None = None,
45
+ task_id: str | None = None,
46
+ **kwargs,
47
+ ) -> HarFeastObservation:
48
+ """Reset environment and load a task. Supports task_index for augmented dataset."""
49
+ self._state = State(episode_id=str(uuid4()), step_count=0)
50
+ result: StepResult = self._env.reset(
51
+ seed=seed,
52
+ task_id=task_id or kwargs.get("task_id"),
53
+ task_index=kwargs.get("task_index"),
54
+ **{k: v for k, v in kwargs.items() if k not in ("task_id", "task_index")},
55
+ )
56
+ return self._step_result_to_obs(result)
57
+
58
+ def step(
59
+ self,
60
+ action: HarFeastAction,
61
+ timeout_s: float | None = None,
62
+ **kwargs,
63
+ ) -> HarFeastObservation:
64
+ """Execute action (action_json) and return observation."""
65
+ try:
66
+ action_dict = json.loads(action.action_json)
67
+ except json.JSONDecodeError as e:
68
+ return HarFeastObservation(
69
+ observation=f"Invalid action JSON: {e}",
70
+ prompt=self._env._prompt,
71
+ step_count=self._env._step_count,
72
+ datasets_available=json.dumps(list(self._env._filtered_datasets.keys())),
73
+ done=False,
74
+ reward=0.0,
75
+ metadata={"error": str(e)},
76
+ )
77
+ result: StepResult = self._env.step(action_dict)
78
+ self._state.step_count = result.step_count
79
+ return self._step_result_to_obs(result)
80
+
81
+ def _step_result_to_obs(self, r: StepResult) -> HarFeastObservation:
82
+ """Convert our StepResult to HarFeastObservation."""
83
+ return HarFeastObservation(
84
+ observation=r.observation,
85
+ prompt=r.prompt,
86
+ step_count=r.step_count,
87
+ datasets_available=json.dumps(r.info.get("datasets_available", [])),
88
+ done=r.done,
89
+ reward=r.reward,
90
+ metadata={
91
+ "action_taken": r.info.get("action_taken"),
92
+ "last_error": r.info.get("last_error"),
93
+ "task_id": self._env.state.get("task_id"),
94
+ },
95
+ )
96
+
97
+ @property
98
+ def state(self) -> State:
99
+ """Current episode state."""
100
+ return self._state
harfeast_openenv/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """HarFeast OpenEnv - Management consulting RL environment."""
2
+
3
+ from harfeast_openenv.environment import HarFeastOpenEnv
4
+
5
+ __all__ = ["HarFeastOpenEnv"]
harfeast_openenv/actions.py ADDED
@@ -0,0 +1,480 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Action handlers for HarFeast OpenEnv."""
2
+
3
+ import ast
4
+ import csv
5
+ import json
6
+ import operator
7
+ import os
8
+ import re
9
+ from collections import defaultdict
10
+ from statistics import median as stat_median
11
+
12
+ from .schemas import ActionResult
13
+
14
+
15
+ # ── Observation size limits ──────────────────────────────────────
16
+ MAX_TABLE_ROWS = 20
17
+
18
+ # ── Safe arithmetic evaluator (replaces eval) ────────────────────
19
+ _SAFE_BINOPS = {
20
+ ast.Add: operator.add, ast.Sub: operator.sub,
21
+ ast.Mult: operator.mul, ast.Div: operator.truediv,
22
+ }
23
+
24
+ def _safe_eval_expr(node, namespace=None):
25
+ """Evaluate an AST node containing only arithmetic on numbers (and optionally named vars)."""
26
+ if isinstance(node, ast.Expression):
27
+ return _safe_eval_expr(node.body, namespace)
28
+ if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
29
+ return node.value
30
+ if isinstance(node, ast.BinOp) and type(node.op) in _SAFE_BINOPS:
31
+ left = _safe_eval_expr(node.left, namespace)
32
+ right = _safe_eval_expr(node.right, namespace)
33
+ return _SAFE_BINOPS[type(node.op)](left, right)
34
+ if isinstance(node, ast.UnaryOp) and isinstance(node.op, ast.USub):
35
+ return -_safe_eval_expr(node.operand, namespace)
36
+ if isinstance(node, ast.Name) and namespace is not None:
37
+ if node.id in namespace:
38
+ return namespace[node.id]
39
+ raise ValueError(f"Unknown variable: {node.id}")
40
+ raise ValueError(f"Unsupported expression element: {ast.dump(node)}")
41
+ MAX_DOCUMENT_CHARS = 2000
42
+ def handle_files_list(world_path: str, path: str = ".") -> ActionResult:
43
+ """
44
+ List files and directories at the given path.
45
+ Path can be ".", "data", "documents", or a subpath like "documents".
46
+ """
47
+ base = os.path.normpath(os.path.join(world_path, path))
48
+ if not os.path.isdir(base):
49
+ return ActionResult(
50
+ observation=f"Path '{path}' does not exist or is not a directory.",
51
+ success=False,
52
+ error=f"Invalid path: {path}",
53
+ )
54
+
55
+ # Ensure we don't escape world_path
56
+ world_abs = os.path.abspath(world_path)
57
+ base_abs = os.path.abspath(base)
58
+ if not base_abs.startswith(world_abs):
59
+ return ActionResult(
60
+ observation="Access denied: path outside world directory.",
61
+ success=False,
62
+ error="Path traversal not allowed",
63
+ )
64
+
65
+ items = sorted(os.listdir(base))
66
+ files = []
67
+ for name in items:
68
+ full = os.path.join(base, name)
69
+ if os.path.isfile(full):
70
+ files.append({"name": name, "type": "file"})
71
+ else:
72
+ files.append({"name": name + "/", "type": "directory"})
73
+
74
+ return ActionResult(
75
+ observation=json.dumps({"path": path, "items": files}, indent=2),
76
+ )
77
+
78
+
79
+ def handle_files_read(world_path: str, path: str) -> ActionResult:
80
+ """
81
+ Read a text document. Only allows .txt files in documents/.
82
+ Rejects CSV paths with a message to use spreadsheet.read_range.
83
+ """
84
+ # Normalize path: accept "scrap_rate_report.txt", "documents/scrap_rate_report.txt", etc.
85
+ path = path.strip().lstrip("/")
86
+ if not path.startswith("documents"):
87
+ path = "documents/" + path
88
+
89
+ full_path = os.path.normpath(os.path.join(world_path, path))
90
+
91
+ # Security: ensure within world_path
92
+ world_abs = os.path.abspath(world_path)
93
+ full_abs = os.path.abspath(full_path)
94
+ if not full_abs.startswith(world_abs):
95
+ return ActionResult(
96
+ observation="Access denied: path outside world directory.",
97
+ success=False,
98
+ error="Path traversal not allowed",
99
+ )
100
+
101
+ # Reject CSV files
102
+ if path.endswith(".csv") or "data/" in path:
103
+ return ActionResult(
104
+ observation=(
105
+ "CSV files cannot be read with files.read. "
106
+ "Use spreadsheet.read_range(file, range) to read CSV data."
107
+ ),
108
+ success=False,
109
+ error="Use spreadsheet.read_range for CSV files",
110
+ )
111
+
112
+ if not os.path.isfile(full_path):
113
+ return ActionResult(
114
+ observation=f"File not found: {path}",
115
+ success=False,
116
+ error=f"File not found: {path}",
117
+ )
118
+
119
+ try:
120
+ with open(full_path, "r", encoding="utf-8") as f:
121
+ content = f.read()
122
+ if len(content) > MAX_DOCUMENT_CHARS:
123
+ total = len(content)
124
+ content = content[:MAX_DOCUMENT_CHARS] + (
125
+ f"\n\n[Truncated — showing first {MAX_DOCUMENT_CHARS} of {total} characters.]"
126
+ )
127
+ return ActionResult(observation=content)
128
+ except Exception as e:
129
+ return ActionResult(
130
+ observation=f"Error reading file: {e}",
131
+ success=False,
132
+ error=str(e),
133
+ )
134
+
135
+
136
+ def _resolve_csv_path(world_path: str, file_or_dataset: str) -> str:
137
+ """Resolve file/dataset name to full CSV path. Reject path traversal."""
138
+ file_or_dataset = file_or_dataset.strip()
139
+ if not file_or_dataset.lower().endswith(".csv"):
140
+ file_or_dataset = file_or_dataset + ".csv"
141
+ if not file_or_dataset.lower().startswith("data"):
142
+ file_or_dataset = "data/" + file_or_dataset.lstrip("/")
143
+ full = os.path.normpath(os.path.join(world_path, file_or_dataset))
144
+ world_abs = os.path.abspath(world_path)
145
+ full_abs = os.path.abspath(full)
146
+ if not full_abs.startswith(world_abs) or not full_abs.endswith(".csv"):
147
+ raise ValueError(f"Invalid path: {file_or_dataset}")
148
+ return full
149
+
150
+
151
+ def _load_csv_rows(path: str) -> tuple[list[str], list[dict]]:
152
+ """Load CSV as (columns, rows)."""
153
+ with open(path, "r", encoding="utf-8") as f:
154
+ reader = csv.DictReader(f)
155
+ columns = reader.fieldnames or []
156
+ rows = list(reader)
157
+ return columns, rows
158
+
159
+
160
+ def _get_table(world_path: str, dataset: str, filtered_datasets: dict) -> tuple[list[str], list[dict]]:
161
+ """Load table (columns, rows) from CSV file or filtered dataset."""
162
+ if dataset in filtered_datasets:
163
+ stored = filtered_datasets[dataset]
164
+ cols = stored["columns"]
165
+ rows = [dict(r) for r in stored["rows"]]
166
+ return cols, rows
167
+ path = _resolve_csv_path(world_path, dataset)
168
+ return _load_csv_rows(path)
169
+
170
+
171
+ def _format_table(columns: list[str], rows: list[dict], max_rows: int | None = None) -> str:
172
+ """Format as text table. Defaults to MAX_TABLE_ROWS."""
173
+ if max_rows is None:
174
+ max_rows = MAX_TABLE_ROWS
175
+ if not rows:
176
+ return " | ".join(columns) + "\n(0 rows)"
177
+ lines = [" | ".join(columns)]
178
+ for r in rows[:max_rows]:
179
+ lines.append(" | ".join(str(r.get(c, "")) for c in columns))
180
+ if len(rows) > max_rows:
181
+ lines.append(f"\n[Showing {max_rows} of {len(rows)} rows. Use data.filter to narrow results.]")
182
+ return "\n".join(lines)
183
+
184
+
185
+ def handle_spreadsheet_read_range(
186
+ world_path: str,
187
+ file: str,
188
+ range_spec: str,
189
+ ) -> ActionResult:
190
+ """
191
+ Read rows from a CSV file.
192
+ range: "columns" (headers only), "1:10" (rows 1-10), "all" (everything).
193
+ """
194
+ try:
195
+ path = _resolve_csv_path(world_path, file)
196
+ except ValueError as e:
197
+ return ActionResult(observation=str(e), success=False, error=str(e))
198
+ if not os.path.isfile(path):
199
+ return ActionResult(
200
+ observation=f"File not found: {file}",
201
+ success=False,
202
+ error=f"File not found: {file}",
203
+ )
204
+ try:
205
+ columns, rows = _load_csv_rows(path)
206
+ except Exception as e:
207
+ return ActionResult(
208
+ observation=f"Error reading CSV: {e}",
209
+ success=False,
210
+ error=str(e),
211
+ )
212
+ range_spec = str(range_spec).strip().lower()
213
+ if range_spec == "columns":
214
+ obs = json.dumps({"columns": columns}, indent=2)
215
+ return ActionResult(observation=obs)
216
+ if range_spec == "all":
217
+ table = _format_table(columns, rows)
218
+ return ActionResult(observation=table)
219
+ # Parse "1:10" format (1-indexed inclusive)
220
+ m = re.match(r"(\d+)\s*:\s*(\d+)", range_spec)
221
+ if m:
222
+ start, end = int(m.group(1)), int(m.group(2))
223
+ start = max(1, start)
224
+ end = min(len(rows), end)
225
+ if start > end:
226
+ return ActionResult(
227
+ observation="Invalid range: start > end",
228
+ success=False,
229
+ error="Invalid range",
230
+ )
231
+ subset = rows[start - 1 : end]
232
+ table = _format_table(columns, subset, max_rows=len(subset))
233
+ return ActionResult(observation=table)
234
+ return ActionResult(
235
+ observation=f"Invalid range: '{range_spec}'. Use 'columns', 'all', or 'start:end' (e.g. '1:10').",
236
+ success=False,
237
+ error="Invalid range",
238
+ )
239
+
240
+
241
+ def _try_float(x: str) -> float | str:
242
+ """Try to parse as float, else return string."""
243
+ try:
244
+ return float(x)
245
+ except (ValueError, TypeError):
246
+ return str(x).strip()
247
+
248
+
249
+ def _row_matches(row: dict, column: str, op: str, compare_val: float | str) -> bool:
250
+ """Check if row matches filter."""
251
+ raw = row.get(column, "")
252
+ is_numeric = isinstance(compare_val, (int, float))
253
+ if op == "contains":
254
+ return str(compare_val).lower() in str(raw).lower()
255
+ if is_numeric:
256
+ try:
257
+ cell = float(raw) if raw != "" else float("nan")
258
+ except (ValueError, TypeError):
259
+ return False
260
+ else:
261
+ cell = str(raw).strip()
262
+ if op == "eq":
263
+ return cell == compare_val
264
+ if op == "neq":
265
+ return cell != compare_val
266
+ if op == "gt":
267
+ return is_numeric and cell > compare_val
268
+ if op == "lt":
269
+ return is_numeric and cell < compare_val
270
+ if op == "gte":
271
+ return is_numeric and cell >= compare_val
272
+ if op == "lte":
273
+ return is_numeric and cell <= compare_val
274
+ return False
275
+
276
+
277
+ def handle_data_filter(
278
+ world_path: str,
279
+ dataset: str,
280
+ column: str,
281
+ operator: str,
282
+ value: str,
283
+ filtered_datasets: dict,
284
+ ) -> ActionResult:
285
+ """
286
+ Filter rows. Operators: eq, neq, gt, lt, gte, lte, contains.
287
+ Stores result as filtered_0, filtered_1, ... in filtered_datasets.
288
+ """
289
+ try:
290
+ columns, rows = _get_table(world_path, dataset, filtered_datasets)
291
+ except Exception as e:
292
+ return ActionResult(observation=str(e), success=False, error=str(e))
293
+ if column not in columns:
294
+ return ActionResult(
295
+ observation=f"Column '{column}' not found. Available: {columns}",
296
+ success=False,
297
+ error=f"Column not found: {column}",
298
+ )
299
+ op = operator.strip().lower()
300
+ if op not in ("eq", "neq", "gt", "lt", "gte", "lte", "contains"):
301
+ return ActionResult(
302
+ observation=f"Unknown operator: {operator}. Use: eq, neq, gt, lt, gte, lte, contains.",
303
+ success=False,
304
+ error=f"Unknown operator: {operator}",
305
+ )
306
+ compare_val = str(value).strip() if op == "contains" else _try_float(value)
307
+ try:
308
+ filtered = [r for r in rows if _row_matches(r, column, op, compare_val)]
309
+ except Exception as e:
310
+ return ActionResult(
311
+ observation=f"Filter error: {e}",
312
+ success=False,
313
+ error=str(e),
314
+ )
315
+ next_idx = len([k for k in filtered_datasets if k.startswith("filtered_")])
316
+ store_name = f"filtered_{next_idx}"
317
+ filtered_datasets[store_name] = {"columns": columns, "rows": filtered}
318
+ return ActionResult(
319
+ observation=json.dumps({"rows": len(filtered), "stored_as": store_name}, indent=2),
320
+ )
321
+
322
+
323
+ def handle_data_group_by(
324
+ world_path: str,
325
+ dataset: str,
326
+ column: str,
327
+ aggregation: str,
328
+ target_column: str,
329
+ filtered_datasets: dict,
330
+ ) -> ActionResult:
331
+ """Group by column and aggregate target_column. Aggregations: sum, mean, median, count, min, max."""
332
+ try:
333
+ columns, rows = _get_table(world_path, dataset, filtered_datasets)
334
+ except Exception as e:
335
+ return ActionResult(observation=str(e), success=False, error=str(e))
336
+ if column not in columns:
337
+ return ActionResult(
338
+ observation=f"Column '{column}' not found. Available: {columns}",
339
+ success=False,
340
+ error=f"Column not found: {column}",
341
+ )
342
+ if target_column not in columns:
343
+ return ActionResult(
344
+ observation=f"Column '{target_column}' not found. Available: {columns}",
345
+ success=False,
346
+ error=f"Column not found: {target_column}",
347
+ )
348
+ agg = aggregation.strip().lower()
349
+ if agg not in ("sum", "mean", "median", "count", "min", "max"):
350
+ return ActionResult(
351
+ observation=f"Unknown aggregation: {aggregation}. Use: sum, mean, median, count, min, max.",
352
+ success=False,
353
+ error=f"Unknown aggregation: {aggregation}",
354
+ )
355
+ try:
356
+ groups: dict[str, list[float]] = defaultdict(list)
357
+ for r in rows:
358
+ key = str(r.get(column, ""))
359
+ raw = r.get(target_column, "")
360
+ try:
361
+ val = float(raw)
362
+ except (ValueError, TypeError):
363
+ if agg == "count":
364
+ val = 1
365
+ else:
366
+ continue
367
+ groups[key].append(val)
368
+ result_rows = []
369
+ for key in sorted(groups.keys()):
370
+ vals = groups[key]
371
+ if agg == "sum":
372
+ v = sum(vals)
373
+ elif agg == "mean":
374
+ v = sum(vals) / len(vals) if vals else 0
375
+ elif agg == "median":
376
+ v = stat_median(vals) if vals else 0
377
+ elif agg == "count":
378
+ v = len(vals)
379
+ elif agg == "min":
380
+ v = min(vals) if vals else 0
381
+ else: # max
382
+ v = max(vals) if vals else 0
383
+ result_rows.append({column: key, f"{agg}({target_column})": round(v, 2) if isinstance(v, float) else v})
384
+ table = _format_table([column, f"{agg}({target_column})"], result_rows, max_rows=1000)
385
+ return ActionResult(observation=table)
386
+ except Exception as e:
387
+ return ActionResult(
388
+ observation=f"Group-by error: {e}",
389
+ success=False,
390
+ error=str(e),
391
+ )
392
+
393
+
394
+ def handle_data_add_columns(
395
+ world_path: str,
396
+ dataset: str,
397
+ new_column: str,
398
+ expression: str,
399
+ filtered_datasets: dict,
400
+ ) -> ActionResult:
401
+ """Create derived column from expression (e.g. 'a + b + c')."""
402
+ try:
403
+ columns, rows = _get_table(world_path, dataset, filtered_datasets)
404
+ except Exception as e:
405
+ return ActionResult(observation=str(e), success=False, error=str(e))
406
+ # Restrict expression to column names and arithmetic
407
+ allowed = set("abcdefghijklmnopqrstuvwxyz_0123456789.+-*/() ")
408
+ if not all(c in allowed for c in expression.lower().replace(" ", "")):
409
+ return ActionResult(
410
+ observation="Expression may only contain column names and +, -, *, /, (, ).",
411
+ success=False,
412
+ error="Invalid expression",
413
+ )
414
+ # Verify all names in expression are columns
415
+ try:
416
+ tree = ast.parse(expression, mode="eval")
417
+ names = {node.id for node in ast.walk(tree) if isinstance(node, ast.Name)}
418
+ for n in names:
419
+ if n not in columns:
420
+ return ActionResult(
421
+ observation=f"Column '{n}' in expression not found. Available: {columns}",
422
+ success=False,
423
+ error=f"Column not found: {n}",
424
+ )
425
+ except SyntaxError as e:
426
+ return ActionResult(
427
+ observation=f"Invalid expression syntax: {e}",
428
+ success=False,
429
+ error=str(e),
430
+ )
431
+ try:
432
+ new_rows = []
433
+ for r in rows:
434
+ row = dict(r)
435
+ ns = {}
436
+ for c in columns:
437
+ v = _try_float(row.get(c, ""))
438
+ ns[c] = v if isinstance(v, (int, float)) else 0
439
+ try:
440
+ row[new_column] = round(_safe_eval_expr(tree, namespace=ns), 2)
441
+ except Exception:
442
+ row[new_column] = 0
443
+ new_rows.append(row)
444
+ new_columns = columns + [new_column]
445
+ next_idx = len([k for k in filtered_datasets if k.startswith("filtered_")])
446
+ store_name = f"filtered_{next_idx}"
447
+ filtered_datasets[store_name] = {"columns": new_columns, "rows": new_rows}
448
+ return ActionResult(
449
+ observation=json.dumps({"rows": len(new_rows), "stored_as": store_name, "new_column": new_column}, indent=2),
450
+ )
451
+ except Exception as e:
452
+ return ActionResult(
453
+ observation=f"Expression error: {e}",
454
+ success=False,
455
+ error=str(e),
456
+ )
457
+
458
+
459
+ def handle_data_compute(expression: str) -> ActionResult:
460
+ """Evaluate a math expression. Only numbers and +, -, *, /, (, )."""
461
+ expr = expression.strip()
462
+ safe_pattern = re.compile(r"^[\d\s+\-*/().]+$")
463
+ if not safe_pattern.match(expr):
464
+ return ActionResult(
465
+ observation="Expression may only contain numbers and +, -, *, /, (, ).",
466
+ success=False,
467
+ error="Invalid expression",
468
+ )
469
+ try:
470
+ tree = ast.parse(expr, mode="eval")
471
+ result = _safe_eval_expr(tree)
472
+ if isinstance(result, float) and not result.is_integer():
473
+ return ActionResult(observation=str(round(result, 2)))
474
+ return ActionResult(observation=str(result))
475
+ except Exception as e:
476
+ return ActionResult(
477
+ observation=f"Compute error: {e}",
478
+ success=False,
479
+ error=str(e),
480
+ )
harfeast_openenv/environment.py ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """HarFeast OpenEnv environment."""
2
+
3
+ import json
4
+ import os
5
+ import random
6
+ from .rubric import score_answer
7
+ from .schemas import ActionResult, StepResult, parse_action
8
+ from . import actions
9
+
10
+
11
+ class HarFeastOpenEnv:
12
+ """
13
+ OpenEnv environment for HarFeast management consulting tasks.
14
+ Phase 1-3: files, spreadsheet, data actions, submit with rubric scoring.
15
+ """
16
+
17
+ def __init__(self, world_path: str | None = None, worlds_base: str | None = None):
18
+ """
19
+ Args:
20
+ world_path: Single world directory (harfeast_world or world_XXXX).
21
+ worlds_base: Base dir with manifest.json + all_tasks.json for augmented dataset.
22
+ When set, reset() samples from all task instances.
23
+ """
24
+ self._worlds_base = os.path.abspath(worlds_base) if worlds_base else None
25
+ self._all_tasks: list[dict] = []
26
+ if self._worlds_base:
27
+ at_path = os.path.join(self._worlds_base, "all_tasks.json")
28
+ if os.path.isfile(at_path):
29
+ with open(at_path) as f:
30
+ self._all_tasks = json.load(f)
31
+
32
+ self.world_path = world_path or os.path.join(
33
+ os.path.dirname(__file__), "..", "harfeast_world"
34
+ )
35
+ self.world_path = os.path.abspath(self.world_path)
36
+
37
+ self._task: dict | None = None
38
+ self._tasks: list = []
39
+ self._prompt: str = ""
40
+ self._step_count: int = 0
41
+ self._done: bool = False
42
+ self._submitted_answer: str | None = None
43
+ self._rubric_score: float | None = None
44
+ self._filtered_datasets: dict = {}
45
+ self._rng: random.Random | None = None
46
+ self._history: list[dict] = []
47
+
48
+ self.CONTEXT_WINDOW_STEPS = 8
49
+ self.MAX_STEPS = 20
50
+
51
+ @property
52
+ def state(self) -> dict:
53
+ """Current environment state."""
54
+ return {
55
+ "task_id": self._task["task_id"] if self._task else None,
56
+ "task_name": self._task["task_name"] if self._task else None,
57
+ "prompt": self._prompt,
58
+ "step_count": self._step_count,
59
+ "done": self._done,
60
+ "submitted_answer": self._submitted_answer,
61
+ "rubric_score": self._rubric_score,
62
+ "filtered_datasets": list(self._filtered_datasets.keys()),
63
+ "history": self._history,
64
+ }
65
+
66
+ def reset(
67
+ self,
68
+ task_id: str | None = None,
69
+ seed: int | None = None,
70
+ **kwargs,
71
+ ) -> StepResult:
72
+ """
73
+ Reset environment and load a task.
74
+ If task_id is None, pick a random task.
75
+ """
76
+ self._step_count = 0
77
+ self._done = False
78
+ self._submitted_answer = None
79
+ self._rubric_score = None
80
+ self._filtered_datasets = {}
81
+ self._rng = random.Random(seed) if seed is not None else random.Random()
82
+ self._history = []
83
+ # Augmented dataset: sample from all_tasks or use specific task_index
84
+ task_index = kwargs.get("task_index")
85
+ if self._all_tasks:
86
+ if task_index is not None and 0 <= task_index < len(self._all_tasks):
87
+ entry = self._all_tasks[task_index]
88
+ else:
89
+ entry = self._rng.choice(self._all_tasks)
90
+ wp = entry["world_path"]
91
+ if not os.path.isabs(wp):
92
+ # e.g. "./harfeast_worlds/world_0000" -> world_0000
93
+ wp = os.path.join(self._worlds_base, os.path.basename(wp.rstrip("/")))
94
+ self.world_path = os.path.abspath(wp)
95
+ tasks_path = os.path.join(self.world_path, "tasks.json")
96
+ with open(tasks_path) as f:
97
+ self._tasks = json.load(f)
98
+ self._task = next(t for t in self._tasks if t["task_id"] == entry["task_id"])
99
+ else:
100
+ # Single world
101
+ tasks_path = os.path.join(self.world_path, "tasks.json")
102
+ if not os.path.isfile(tasks_path):
103
+ raise FileNotFoundError(f"Tasks not found: {tasks_path}. Run world generator first.")
104
+ with open(tasks_path, "r", encoding="utf-8") as f:
105
+ self._tasks = json.load(f)
106
+ if task_id:
107
+ matches = [t for t in self._tasks if t["task_id"] == task_id]
108
+ if not matches:
109
+ raise ValueError(f"Task not found: {task_id}")
110
+ self._task = matches[0]
111
+ else:
112
+ self._task = self._rng.choice(self._tasks)
113
+
114
+ self._prompt = self._task["prompt"]
115
+
116
+ return StepResult(
117
+ observation=f"Task: {self._task['task_name']}\n\nPrompt:\n{self._prompt}\n\nYou can use files.list(path), files.read(path), or other actions. What would you like to do?",
118
+ prompt=self._prompt,
119
+ step_count=0,
120
+ done=False,
121
+ reward=0.0,
122
+ info={"task_id": self._task["task_id"], "action_taken": "reset"},
123
+ )
124
+
125
+ def step(self, action: dict | str) -> StepResult:
126
+ """
127
+ Execute one action and return the result.
128
+ Action format: {"action": "files.list", "path": "."} or JSON string.
129
+ """
130
+ if self._task is None:
131
+ return StepResult(
132
+ observation="No task loaded. Call reset() before step().",
133
+ prompt="",
134
+ step_count=0,
135
+ done=True,
136
+ reward=0.0,
137
+ info={"action_taken": "none", "last_error": "reset() not called"},
138
+ )
139
+ if self._done:
140
+ return StepResult(
141
+ observation="Episode already ended. Call reset() to start a new episode.",
142
+ prompt=self._prompt,
143
+ step_count=self._step_count,
144
+ done=True,
145
+ reward=self._rubric_score or 0.0,
146
+ info={"action_taken": "none", "last_error": "Episode already ended"},
147
+ )
148
+ if self._step_count >= self.MAX_STEPS:
149
+ self._done = True
150
+ return self._make_step_result(
151
+ observation=f"Episode terminated: reached {self.MAX_STEPS} step limit without submitting.",
152
+ action_taken="timeout"
153
+ )
154
+
155
+
156
+ try:
157
+ name, params = parse_action(action)
158
+ except (ValueError, json.JSONDecodeError) as e:
159
+ return self._make_step_result(
160
+ observation=f"Invalid action format: {e}",
161
+ action_taken="parse_error",
162
+ success=False,
163
+ last_error=str(e),
164
+ )
165
+
166
+ # Dispatch to handler
167
+ result = self._dispatch(name, params)
168
+ self._step_count += 1
169
+ # Record in history for training context reconstruction
170
+ self._history.append({
171
+ "step": self._step_count,
172
+ "action": {"action": name, **params},
173
+ "observation": result.observation,
174
+ "success": result.success,
175
+ })
176
+ step_result = self._make_step_result(
177
+ observation=result.observation,
178
+ action_taken=name,
179
+ success=result.success,
180
+ last_error=result.error,
181
+ )
182
+ if name == "submit":
183
+ step_result.info["rubric_score"] = self._rubric_score
184
+ return step_result
185
+
186
+ def _dispatch(self, name: str, params: dict) -> ActionResult:
187
+ """Dispatch action to handler."""
188
+ if name == "files.list":
189
+ path = params.get("path", ".")
190
+ return actions.handle_files_list(self.world_path, path)
191
+
192
+ if name == "files.read":
193
+ path = params.get("path")
194
+ if path is None:
195
+ return ActionResult(
196
+ observation="files.read requires 'path' parameter.",
197
+ success=False,
198
+ error="Missing path",
199
+ )
200
+ return actions.handle_files_read(self.world_path, path)
201
+
202
+ # Phase 2: spreadsheet and data actions
203
+ if name == "spreadsheet.read_range":
204
+ file = params.get("file")
205
+ range_spec = params.get("range", "columns")
206
+ if file is None:
207
+ return ActionResult(
208
+ observation="spreadsheet.read_range requires 'file' parameter.",
209
+ success=False,
210
+ error="Missing file",
211
+ )
212
+ return actions.handle_spreadsheet_read_range(self.world_path, file, range_spec)
213
+
214
+ if name == "data.filter":
215
+ dataset = params.get("dataset")
216
+ column = params.get("column")
217
+ operator = params.get("operator")
218
+ value = params.get("value")
219
+ if None in (dataset, column, operator, value):
220
+ return ActionResult(
221
+ observation="data.filter requires dataset, column, operator, value.",
222
+ success=False,
223
+ error="Missing parameters",
224
+ )
225
+ return actions.handle_data_filter(
226
+ self.world_path, dataset, column, operator, str(value), self._filtered_datasets
227
+ )
228
+
229
+ if name == "data.group_by":
230
+ dataset = params.get("dataset")
231
+ column = params.get("column")
232
+ aggregation = params.get("aggregation")
233
+ target_column = params.get("target_column")
234
+ if None in (dataset, column, aggregation, target_column):
235
+ return ActionResult(
236
+ observation="data.group_by requires dataset, column, aggregation, target_column.",
237
+ success=False,
238
+ error="Missing parameters",
239
+ )
240
+ return actions.handle_data_group_by(
241
+ self.world_path, dataset, column, aggregation, target_column, self._filtered_datasets
242
+ )
243
+
244
+ if name == "data.add_columns":
245
+ dataset = params.get("dataset")
246
+ new_column = params.get("new_column")
247
+ expression = params.get("expression")
248
+ if None in (dataset, new_column, expression):
249
+ return ActionResult(
250
+ observation="data.add_columns requires dataset, new_column, expression.",
251
+ success=False,
252
+ error="Missing parameters",
253
+ )
254
+ return actions.handle_data_add_columns(
255
+ self.world_path, dataset, new_column, expression, self._filtered_datasets
256
+ )
257
+
258
+ if name == "data.compute":
259
+ expression = params.get("expression")
260
+ if expression is None:
261
+ return ActionResult(
262
+ observation="data.compute requires 'expression' parameter.",
263
+ success=False,
264
+ error="Missing expression",
265
+ )
266
+ return actions.handle_data_compute(str(expression))
267
+ if name == "submit":
268
+ answer = params.get("answer")
269
+ if answer is None or (isinstance(answer, str) and not answer.strip()):
270
+ return ActionResult(
271
+ observation="submit requires non-empty 'answer' parameter.",
272
+ success=False,
273
+ error="Missing answer",
274
+ )
275
+ answer_text = str(answer).strip()
276
+ rubric_list = self._task.get("rubric", [])
277
+ score, results = score_answer(answer_text, rubric_list)
278
+ self._submitted_answer = answer_text
279
+ self._rubric_score = score
280
+ self._done = True
281
+ passed = sum(1 for _, p in results if p)
282
+ total = len(results)
283
+ obs = (
284
+ f"Episode ended. Rubric score: {score:.1f}% ({passed}/{total} criteria met).\n"
285
+ f"Details:\n" + "\n".join(f" {'✓' if p else '✗'} {c[:70]}{'...' if len(c) > 70 else ''}" for c, p in results)
286
+ )
287
+ return ActionResult(observation=obs)
288
+
289
+ return ActionResult(
290
+ observation=f"Unknown action: {name}. Valid actions: files.list, files.read, spreadsheet.read_range, data.filter, data.group_by, data.add_columns, data.compute, submit.",
291
+ success=False,
292
+ error=f"Unknown action: {name}",
293
+ )
294
+
295
+
296
+ def _build_context_summary(self) -> str:
297
+ """Compact summary of the episode so far, prepended to every observation."""
298
+ if not self._history or not self._task:
299
+ return ""
300
+
301
+ lines = [f"=== Task: {self._task['task_name']} ==="]
302
+ prompt_short = self._prompt[:200] + "..." if len(self._prompt) > 200 else self._prompt
303
+ lines.append(prompt_short)
304
+
305
+ total = len(self._history)
306
+
307
+ if total > self.CONTEXT_WINDOW_STEPS:
308
+ older = total - self.CONTEXT_WINDOW_STEPS
309
+ lines.append(f"=== Context ({older} earlier steps omitted) ===")
310
+ recent = self._history[-self.CONTEXT_WINDOW_STEPS:]
311
+ else:
312
+ lines.append(f"=== Context (steps 1-{total}) ===")
313
+ recent = self._history
314
+
315
+ for entry in recent:
316
+ action = entry["action"]
317
+ action_name = action.get("action", "?")
318
+ obs = entry["observation"]
319
+ if len(obs) > 300:
320
+ obs_short = obs[:300] + "..."
321
+ else:
322
+ obs_short = obs
323
+ obs_short = " ".join(obs_short.split())
324
+ lines.append(f" Step {entry['step']}: {action_name} → {obs_short}")
325
+
326
+ ds = list(self._filtered_datasets.keys())
327
+ if ds:
328
+ lines.append(f" Available datasets: {', '.join(ds)}")
329
+
330
+ lines.append("=== Current ===")
331
+ return "\n".join(lines) + "\n"
332
+
333
+
334
+ def _make_step_result(self, observation, action_taken, success=True, last_error=None):
335
+ """Build StepResult from action outcome."""
336
+ # Prepend history context so the agent always has full episode context
337
+ context = self._build_context_summary()
338
+ full_observation = context + observation
339
+
340
+ return StepResult(
341
+ observation=full_observation,
342
+ prompt=self._prompt,
343
+ step_count=self._step_count,
344
+ done=self._done,
345
+ reward=self._rubric_score if self._done else 0.0,
346
+ info={
347
+ "action_taken": action_taken,
348
+ "datasets_available": list(self._filtered_datasets.keys()),
349
+ "last_error": last_error,
350
+ },
351
+ )
harfeast_openenv/rewards.py ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GDPO-style decomposed reward functions for HarFeast GRPO training.
3
+
4
+ Three independent reward signals, each normalized independently by TRL's
5
+ GRPOTrainer when passed as a list to reward_funcs. This is equivalent to
6
+ NVIDIA's GDPO (Jan 2026) multi-signal normalization.
7
+
8
+ Signature: reward_func(completions: list[list[dict]], **kwargs) -> list[float]
9
+ - completions[i] = [{"role": "assistant", "content": "..."}]
10
+ - kwargs include dataset columns: "rubric" (JSON-serialized list of criteria)
11
+ """
12
+
13
+ import json
14
+ import re
15
+ from .rubric import score_answer
16
+
17
+
18
+ def _extract_text(completions):
19
+ """Extract plain text from TRL chat-format completions."""
20
+ texts = []
21
+ for comp in completions:
22
+ if isinstance(comp, list) and comp:
23
+ texts.append(comp[-1].get("content", ""))
24
+ elif isinstance(comp, str):
25
+ texts.append(comp)
26
+ else:
27
+ texts.append("")
28
+ return texts
29
+
30
+
31
+ def _extract_answer(text):
32
+ """Pull the answer portion after 'Answer:' if present."""
33
+ if "Answer:" in text:
34
+ return text.split("Answer:")[-1].strip()
35
+ return text.strip()
36
+
37
+
38
+ def reward_correctness(completions, **kwargs):
39
+ """
40
+ Signal 1: Rubric correctness (0.0 - 1.0).
41
+ Scores each completion against task rubric criteria using deterministic
42
+ substring matching. This is the primary learning signal.
43
+ """
44
+ texts = _extract_text(completions)
45
+ rubric_strs = kwargs.get("rubric", [])
46
+ rewards = []
47
+ for i, text in enumerate(texts):
48
+ answer = _extract_answer(text)
49
+ try:
50
+ rubric = json.loads(rubric_strs[i]) if i < len(rubric_strs) else []
51
+ except (json.JSONDecodeError, TypeError):
52
+ rubric = []
53
+ if not rubric:
54
+ rewards.append(0.0)
55
+ continue
56
+ score, _ = score_answer(answer, rubric)
57
+ rewards.append(score / 100.0)
58
+ return rewards
59
+
60
+
61
+ def reward_format(completions, **kwargs):
62
+ """
63
+ Signal 2: Format compliance (0.0 or 1.0).
64
+ Checks that the completion follows the expected output structure:
65
+ contains 'Answer:', includes at least one number, reasonable length.
66
+ """
67
+ texts = _extract_text(completions)
68
+ rewards = []
69
+ for text in texts:
70
+ score = 0.0
71
+ has_answer_prefix = "Answer:" in text or "answer:" in text.lower()
72
+ has_number = bool(re.search(r"\d+\.?\d*", text))
73
+ reasonable_length = 50 <= len(text) <= 3000
74
+ if has_answer_prefix and has_number and reasonable_length:
75
+ score = 1.0
76
+ elif has_number and reasonable_length:
77
+ score = 0.5
78
+ rewards.append(score)
79
+ return rewards
80
+
81
+
82
+ def reward_completeness(completions, **kwargs):
83
+ """
84
+ Signal 3: Numeric completeness (0.0 - 1.0).
85
+ Measures how many distinct numeric values appear in the answer relative
86
+ to the number of rubric criteria. Rewards specificity: an answer with
87
+ concrete numbers for every criterion scores higher.
88
+ """
89
+ texts = _extract_text(completions)
90
+ rubric_strs = kwargs.get("rubric", [])
91
+ rewards = []
92
+ for i, text in enumerate(texts):
93
+ answer = _extract_answer(text)
94
+ try:
95
+ rubric = json.loads(rubric_strs[i]) if i < len(rubric_strs) else []
96
+ except (json.JSONDecodeError, TypeError):
97
+ rubric = []
98
+ n_criteria = max(len(rubric), 1)
99
+ numbers = set(re.findall(r"\b\d[\d,.]*\d\b|\b\d+\b", answer))
100
+ ratio = min(len(numbers) / n_criteria, 1.0)
101
+ rewards.append(round(ratio, 3))
102
+ return rewards
harfeast_openenv/rubric.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Rubric scoring for HarFeast OpenEnv."""
2
+
3
+ import re
4
+ from typing import Sequence
5
+
6
+
7
+ def _extract_expected_value(criterion: str) -> str | None:
8
+ """
9
+ Extract the expected value from a rubric criterion.
10
+ Pattern: "States that ... is VALUE" or "States that ... VALUE"
11
+ """
12
+ # Match " is X" or " is $X" at the end
13
+ m = re.search(r"\s+is\s+(.+)$", criterion)
14
+ if m:
15
+ return m.group(1).strip().strip('"')
16
+ return None
17
+
18
+
19
+ def _normalize_for_match(value: str) -> list[str]:
20
+ """
21
+ Return variants of the value to check against the answer.
22
+ Handles numbers with commas, percentages, etc.
23
+ """
24
+ value = value.strip()
25
+ variants = [value]
26
+ # Remove commas from numbers
27
+ no_commas = value.replace(",", "")
28
+ if no_commas != value:
29
+ variants.append(no_commas)
30
+ # For percentages: "14%" -> also accept "14" and "14 percent"
31
+ if value.endswith("%"):
32
+ num_part = value[:-1].strip()
33
+ variants.extend([num_part, f"{num_part}%", f"{num_part} percent"])
34
+ # Remove trailing .0 for whole numbers
35
+ if "." in num_part and num_part.endswith("0"):
36
+ variants.append(num_part.rstrip("0").rstrip("."))
37
+ # For dollar amounts: "$21,953,848,911" -> also without $
38
+ if value.startswith("$"):
39
+ variants.append(value[1:].strip())
40
+ variants.append(value[1:].replace(",", ""))
41
+ # For decimals like 87.00% - accept 87
42
+ if "%" in value and "." in value:
43
+ num_part = value.replace("%", "").strip()
44
+ try:
45
+ f = float(num_part)
46
+ if f == int(f):
47
+ variants.append(str(int(f)))
48
+ except ValueError:
49
+ pass
50
+ return list(dict.fromkeys(variants)) # dedupe preserving order
51
+
52
+
53
+ def _answer_contains_value(answer: str, expected: str) -> bool:
54
+ """Check if answer contains the expected value (or a normalized variant)."""
55
+ answer_lower = answer.lower()
56
+ variants = _normalize_for_match(expected)
57
+ for v in variants:
58
+ if not v:
59
+ continue
60
+ # Case-insensitive for text; exact substring for numbers
61
+ if v.lower() in answer_lower:
62
+ return True
63
+ # For numbers, also check without leading zeros
64
+ if v.isdigit() and str(int(v)) in answer:
65
+ return True
66
+ return False
67
+
68
+
69
+ def score_answer(answer: str, rubric: Sequence[str]) -> tuple[float, list[tuple[str, bool]]]:
70
+ """
71
+ Score an answer against rubric criteria.
72
+ Returns (score_0_to_100, list of (criterion, passed)).
73
+ """
74
+ if not rubric:
75
+ return 100.0, []
76
+ results = []
77
+ for criterion in rubric:
78
+ expected = _extract_expected_value(criterion)
79
+ if expected is None:
80
+ # No " is X" pattern - fall back to substring of criterion
81
+ # e.g. "States that X" - check if key phrase appears
82
+ key = criterion.replace("States that ", "").strip()
83
+ passed = key.lower() in answer.lower()
84
+ else:
85
+ passed = _answer_contains_value(answer, expected)
86
+ results.append((criterion, passed))
87
+ passed_count = sum(1 for _, p in results if p)
88
+ score = (passed_count / len(rubric)) * 100.0
89
+ return round(score, 1), results
harfeast_openenv/schemas.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Action and observation schemas for HarFeast OpenEnv."""
2
+
3
+ from dataclasses import dataclass, field
4
+ from typing import Any
5
+
6
+
7
+ @dataclass
8
+ class ActionResult:
9
+ """Result of executing an action."""
10
+ observation: str
11
+ success: bool = True
12
+ error: str | None = None
13
+
14
+
15
+ @dataclass
16
+ class StepResult:
17
+ """Result returned by environment.step()."""
18
+ observation: str
19
+ prompt: str
20
+ step_count: int
21
+ done: bool
22
+ reward: float
23
+ info: dict[str, Any] = field(default_factory=dict)
24
+
25
+
26
+ def parse_action(action: dict | str) -> tuple[str, dict]:
27
+ """
28
+ Parse action from dict or JSON string.
29
+ Returns (action_name, params).
30
+ """
31
+ if isinstance(action, str):
32
+ import json
33
+ action = json.loads(action)
34
+
35
+ if not isinstance(action, dict) or "action" not in action:
36
+ raise ValueError("Action must be a dict with 'action' key")
37
+
38
+ name = action["action"]
39
+ params = {k: v for k, v in action.items() if k != "action"}
40
+ return name, params
harfeast_synthetic_world_generator.py ADDED
@@ -0,0 +1,1454 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HarFeast Synthetic World Generator
3
+ Generates all data sources, computes ground truth, and produces task prompts + rubrics
4
+ for an APEX-style management consulting RL environment.
5
+
6
+ Supports parameterized generation for 200-500+ distinct task instances (RL scalability).
7
+
8
+ Usage:
9
+ python harfeast_synthetic_world_generator.py [--seed 42] [--output-dir ./world]
10
+ python harfeast_synthetic_world_generator.py --batch 40 --output-dir ./harfeast_worlds
11
+ """
12
+
13
+ import random
14
+ import csv
15
+ import json
16
+ import os
17
+ import math
18
+ from collections import defaultdict
19
+ from dataclasses import dataclass, field
20
+ from typing import Optional
21
+
22
+ # =============================================================================
23
+ # WORLD CONFIG - Parameterized variations
24
+ # =============================================================================
25
+
26
+ # Plant pool: (city, state) - one per state group. Order: IL, WI, IA, OH, MI.
27
+ # Tasks 5/9 need plants[0]=IL, plants[1]=WI, plants[2]=IA
28
+ PLANT_POOL_IL = ["Rockford", "Peoria", "Springfield", "Champaign", "Bloomington"]
29
+ PLANT_POOL_WI = ["Madison", "Milwaukee", "Green Bay", "Kenosha", "Racine"]
30
+ PLANT_POOL_IA = ["Cedar Rapids", "Des Moines", "Davenport", "Sioux City", "Iowa City"]
31
+ PLANT_POOL_OH = ["Toledo", "Columbus", "Cleveland", "Cincinnati", "Akron"]
32
+ PLANT_POOL_MI = ["Kalamazoo", "Lansing", "Detroit", "Grand Rapids", "Flint"]
33
+
34
+
35
+ @dataclass
36
+ class WorldConfig:
37
+ """Configuration for a single world variation."""
38
+ seed: int = 42
39
+ n_employees: int = 3000
40
+ plants: tuple = field(default_factory=lambda: (
41
+ "Rockford, Illinois", "Madison, Wisconsin", "Cedar Rapids, Iowa",
42
+ "Toledo, Ohio", "Kalamazoo, Michigan",
43
+ ))
44
+ target_scrap_pct: float = 4.0
45
+ scrap_range_max_pct: float = 7.0
46
+ training_received_weight: float = 0.4
47
+ frito_lay_reduction_pct: float = 30.0
48
+ wage_scale: float = 1.0
49
+ # Aptean report: add small noise to growth numbers
50
+ aptean_noise: float = 0.0
51
+
52
+
53
+ def sample_world_config(rng: random.Random, seed: int) -> WorldConfig:
54
+ """Sample a random world configuration for variation."""
55
+ il = rng.choice(PLANT_POOL_IL) + ", Illinois"
56
+ wi = rng.choice(PLANT_POOL_WI) + ", Wisconsin"
57
+ ia = rng.choice(PLANT_POOL_IA) + ", Iowa"
58
+ oh = rng.choice(PLANT_POOL_OH) + ", Ohio"
59
+ mi = rng.choice(PLANT_POOL_MI) + ", Michigan"
60
+ plants = (il, wi, ia, oh, mi)
61
+
62
+ target = rng.choice([3.5, 4.0, 4.5])
63
+ range_max = target + rng.choice([2.5, 3.0, 3.5])
64
+
65
+ return WorldConfig(
66
+ seed=seed,
67
+ n_employees=rng.randint(2000, 5000),
68
+ plants=plants,
69
+ target_scrap_pct=target,
70
+ scrap_range_max_pct=range_max,
71
+ training_received_weight=rng.uniform(0.35, 0.5),
72
+ frito_lay_reduction_pct=rng.choice([28.0, 30.0, 32.0]),
73
+ wage_scale=rng.uniform(0.95, 1.05),
74
+ aptean_noise=rng.uniform(0, 0.5),
75
+ )
76
+
77
+
78
+ def _plant_divisions(plants: tuple) -> dict:
79
+ """Build plant->census_division map. IA=West North Central, rest=East North Central."""
80
+ div = {}
81
+ for i, p in enumerate(plants):
82
+ div[p] = "West North Central" if i == 2 else "East North Central"
83
+ return div
84
+
85
+ ROLES = [
86
+ "Production/Manufacturing Operator",
87
+ "Quality Control/Quality Assurance",
88
+ "Maintenance Technician",
89
+ "Production Supervisor/Team Lead",
90
+ "Supply Chain/Logistics Coordinator",
91
+ "Demand Planning/Forecasting",
92
+ "Administrative/Support Staff",
93
+ "Plant Management",
94
+ ]
95
+
96
+ ROLE_TYPES = {
97
+ "Production/Manufacturing Operator": "Front-line",
98
+ "Quality Control/Quality Assurance": "Front-line",
99
+ "Maintenance Technician": "Front-line",
100
+ "Production Supervisor/Team Lead": "Supervisor/Team Lead",
101
+ "Supply Chain/Logistics Coordinator": "Back-office/Support",
102
+ "Demand Planning/Forecasting": "Back-office/Support",
103
+ "Administrative/Support Staff": "Back-office/Support",
104
+ "Plant Management": "Management",
105
+ }
106
+
107
+ PRODUCT_FAMILIES = ["Canned Vegetables", "Condiments", "Sauces"]
108
+ EQUIPMENT_TYPES = ["Mixer", "Filler", "Sealer", "Conveyor", "Boiler", "Pasteurizer", "Labeler"]
109
+ TRAINING_QUALITY_OPTIONS = [
110
+ "Excellent- comprehensive and very helpful",
111
+ "Good- adequate for most needs",
112
+ "Fair- some gaps or inconsistencies",
113
+ "Poor - insufficient or unhelpful",
114
+ ]
115
+
116
+ # Base hourly wages by role (used for wage data file)
117
+ BASE_WAGES = {
118
+ "Production/Manufacturing Operator": 18.50,
119
+ "Quality Control/Quality Assurance": 22.00,
120
+ "Maintenance Technician": 25.50,
121
+ "Production Supervisor/Team Lead": 28.00,
122
+ "Supply Chain/Logistics Coordinator": 24.00,
123
+ "Demand Planning/Forecasting": 30.00,
124
+ "Administrative/Support Staff": 20.00,
125
+ "Plant Management": 42.00,
126
+ }
127
+
128
+ # =============================================================================
129
+ # DATA GENERATORS
130
+ # =============================================================================
131
+
132
+ def generate_employee_survey(rng, cfg: WorldConfig):
133
+ """Generate the main employee workforce survey dataset."""
134
+ employees = []
135
+ n = cfg.n_employees
136
+ plants = list(cfg.plants)
137
+ high_inefficiency_plants = list(cfg.plants[3:5])
138
+ willing_high, willing_low = cfg.plants[1], cfg.plants[0]
139
+
140
+ for i in range(n):
141
+ plant = rng.choice(plants)
142
+ role = rng.choice(ROLES)
143
+ role_type = ROLE_TYPES[role]
144
+
145
+ # Inefficient hours - higher for Toledo/Kalamazoo
146
+ if plant in high_inefficiency_plants:
147
+ manual = round(rng.uniform(8, 30), 1)
148
+ searching = round(rng.uniform(4, 18), 1)
149
+ fixing = round(rng.uniform(3, 12), 1)
150
+ else:
151
+ manual = round(rng.uniform(0, 8), 1)
152
+ searching = round(rng.uniform(0, 5), 1)
153
+ fixing = round(rng.uniform(0, 4), 1)
154
+
155
+ # Digital readiness varies by role type
156
+ base_readiness = {"Front-line": 4, "Back-office/Support": 6,
157
+ "Supervisor/Team Lead": 5, "Management": 7}
158
+ readiness = round(rng.gauss(base_readiness[role_type], 2), 1)
159
+ readiness = max(1, min(10, readiness))
160
+
161
+ comfort = round(rng.gauss(5.5, 2), 1)
162
+ comfort = max(1, min(10, comfort))
163
+
164
+ willing_pilot = rng.choice(["Yes", "No"])
165
+ training_days = rng.choice(["<1 day", "1-2 days", ">2 days"])
166
+ dedicated_time = rng.choice(["Yes", "No"])
167
+
168
+ training_received = rng.choices(
169
+ ["Yes", "No"],
170
+ weights=[cfg.training_received_weight, 1 - cfg.training_received_weight],
171
+ )[0]
172
+ if training_received == "Yes":
173
+ quality = rng.choices(
174
+ TRAINING_QUALITY_OPTIONS,
175
+ weights=[0.16, 0.41, 0.33, 0.10]
176
+ )[0]
177
+ else:
178
+ quality = ""
179
+
180
+ # Willingness to adopt - varies by plant (highest/lowest for Task 12)
181
+ if plant == willing_high:
182
+ willingness = round(rng.gauss(3.8, 0.8), 1)
183
+ elif plant == willing_low:
184
+ willingness = round(rng.gauss(2.5, 0.8), 1)
185
+ else:
186
+ willingness = round(rng.gauss(3.2, 0.9), 1)
187
+ willingness = max(1, min(5, willingness))
188
+
189
+ base = BASE_WAGES[role] * cfg.wage_scale
190
+ hourly_wage = round(rng.gauss(base, 3), 2)
191
+ hourly_wage = max(12, hourly_wage)
192
+
193
+ union_status = rng.choice(["Union", "Non-Union"])
194
+
195
+ employees.append({
196
+ "employee_id": f"EMP-{i:04d}",
197
+ "plant": plant,
198
+ "role": role,
199
+ "role_type": role_type,
200
+ "digital_readiness_score": readiness,
201
+ "digital_comfort_score": comfort,
202
+ "willing_to_pilot": willing_pilot,
203
+ "training_days_willing": training_days,
204
+ "dedicated_training_time": dedicated_time,
205
+ "hours_manual_entry": manual,
206
+ "hours_searching_data": searching,
207
+ "hours_fixing_errors": fixing,
208
+ "hourly_wage": hourly_wage,
209
+ "training_received": training_received,
210
+ "training_quality": quality,
211
+ "willingness_to_adopt": willingness,
212
+ "union_status": union_status,
213
+ })
214
+
215
+ return employees
216
+
217
+
218
+ def generate_equipment_data(rng, cfg: WorldConfig):
219
+ """Generate plant equipment dataset. ~50 per plant."""
220
+ equipment = []
221
+ eq_id = 0
222
+ plants = cfg.plants
223
+ oee_base = {p: 0.78 - i * 0.02 + rng.uniform(-0.02, 0.02) for i, p in enumerate(plants)}
224
+ oee_base = {p: max(0.65, min(0.88, v)) for p, v in oee_base.items()}
225
+
226
+ for plant in plants:
227
+ n_equip = rng.randint(45, 55)
228
+ for j in range(n_equip):
229
+ pf = rng.choice(PRODUCT_FAMILIES)
230
+ et = rng.choice(EQUIPMENT_TYPES)
231
+
232
+ scheduled = round(rng.uniform(1500, 5000))
233
+ actual = round(scheduled * rng.uniform(0.7, 0.98))
234
+ standard = round(scheduled * rng.uniform(0.85, 1.0))
235
+ labor = round(rng.uniform(500, 3000))
236
+
237
+ # Scrap rates - most between 3-9%, some outliers
238
+ scrap = round(rng.uniform(0.03, 0.10), 4)
239
+
240
+ oee = round(rng.gauss(oee_base[plant], 0.06), 4)
241
+ oee = max(0.45, min(0.95, oee))
242
+
243
+ downtime = round(rng.uniform(50, 500))
244
+ units = rng.randint(100000, 600000)
245
+ cogs = round(rng.uniform(800, 2000), 2)
246
+ failure_cost = round(rng.uniform(5000, 50000), 2)
247
+
248
+ equipment.append({
249
+ "equipment_id": f"EQ-{plant[:3].upper()}-{eq_id:03d}",
250
+ "plant": plant,
251
+ "product_family": pf,
252
+ "equipment_type": et,
253
+ "scheduled_hours": scheduled,
254
+ "actual_hours": actual,
255
+ "standard_hours": standard,
256
+ "labor_hours": labor,
257
+ "scrap_rate": scrap,
258
+ "oee": oee,
259
+ "unplanned_downtime_hours": downtime,
260
+ "units_produced": units,
261
+ "cogs_per_ton": cogs,
262
+ "failure_cost": failure_cost,
263
+ })
264
+ eq_id += 1
265
+
266
+ return equipment
267
+
268
+
269
+ def generate_quality_losses(rng, equipment):
270
+ """Generate quality losses data derived from equipment data."""
271
+ losses = []
272
+ for eq in equipment:
273
+ scrap_cost = round(eq["cogs_per_ton"] * eq["units_produced"] * eq["scrap_rate"] / 1000, 2)
274
+ failure = round(rng.uniform(2000, 30000), 2)
275
+ losses.append({
276
+ "equipment_id": eq["equipment_id"],
277
+ "plant": eq["plant"],
278
+ "product_family": eq["product_family"],
279
+ "scrap_cost": scrap_cost,
280
+ "unplanned_failure_cost": failure,
281
+ })
282
+ return losses
283
+
284
+
285
+ def generate_plant_labor(rng, cfg: WorldConfig):
286
+ """Generate per-employee plant labor data for Tasks 5 and 9."""
287
+ labor = []
288
+ lab_id = 0
289
+ plant_divs = _plant_divisions(cfg.plants)
290
+
291
+ production_roles = [
292
+ "Production Operator", "Quality Inspector", "Maintenance Tech",
293
+ "Production Supervisor", "Line Lead", "Packaging Operator"
294
+ ]
295
+
296
+ for plant in cfg.plants[:3]: # Tasks 5 and 9 only use IL, WI, IA plants
297
+ n_workers = rng.randint(15, 25)
298
+ for j in range(n_workers):
299
+ role = rng.choice(production_roles)
300
+ is_supervisor = "Supervisor" in role or "Lead" in role
301
+ wage = round(rng.gauss(22 if is_supervisor else 18, 2), 2)
302
+ wage = max(14, wage)
303
+
304
+ labor.append({
305
+ "employee_id": f"LAB-{plant[:3].upper()}-{lab_id:03d}",
306
+ "plant": plant,
307
+ "role": role,
308
+ "hourly_wage": wage,
309
+ "annual_hours": 2080,
310
+ "union_status": rng.choice(["Union", "Non-Union"]),
311
+ "supervisor_type": "production" if is_supervisor else "non-production",
312
+ "census_division": plant_divs[plant],
313
+ })
314
+ lab_id += 1
315
+
316
+ return labor
317
+
318
+
319
+ def generate_bls_wages(cfg: WorldConfig):
320
+ """BLS wage benchmark data."""
321
+ s = cfg.wage_scale
322
+ return [
323
+ {"occupation": "All Occupations", "industry": "Food Manufacturing", "median_hourly_wage": round(19.76 * s, 2)},
324
+ {"occupation": "Production Workers", "industry": "Food Manufacturing", "median_hourly_wage": round(17.85 * s, 2)},
325
+ {"occupation": "Supervisors", "industry": "Food Manufacturing", "median_hourly_wage": round(28.50 * s, 2)},
326
+ {"occupation": "Maintenance", "industry": "Food Manufacturing", "median_hourly_wage": round(24.30 * s, 2)},
327
+ {"occupation": "Quality Control", "industry": "Food Manufacturing", "median_hourly_wage": round(21.15 * s, 2)},
328
+ {"occupation": "Logistics", "industry": "Food Manufacturing", "median_hourly_wage": round(22.80 * s, 2)},
329
+ ]
330
+
331
+
332
+ def generate_attached_wages(cfg: WorldConfig):
333
+ """Client-provided updated wage data for Task 10."""
334
+ s = cfg.wage_scale
335
+ bases = [21.50, 25.80, 29.40, 33.20, 27.60, 35.10, 23.40, 48.50]
336
+ roles = list(ROLES)
337
+ return [{"role": r, "avg_hourly_salary": round(b * s, 2)} for r, b in zip(roles, bases)]
338
+
339
+
340
+ def generate_oee_assumptions(cfg: WorldConfig, rng: random.Random):
341
+ """OEE improvement assumptions for Task 4."""
342
+ plants = cfg.plants
343
+ base_oee = [0.78, 0.76, 0.80, 0.73, 0.71]
344
+ improvements = [0.030, 0.028, 0.032, 0.025, 0.024]
345
+ start_years = [2025, 2025, 2025, 2026, 2026]
346
+ return [
347
+ {
348
+ "plant": p,
349
+ "current_annual_oee": round(base_oee[i] + rng.uniform(-0.02, 0.02), 4),
350
+ "annual_oee_improvement": round(improvements[i] + rng.uniform(-0.002, 0.002), 4),
351
+ "investment_start_year": start_years[i],
352
+ "world_class_oee_target": 0.85,
353
+ }
354
+ for i, p in enumerate(plants)
355
+ ]
356
+
357
+
358
+ def generate_plant_sales(cfg: WorldConfig, rng: random.Random):
359
+ """Plant unit sales data for Task 11."""
360
+ plants = list(cfg.plants)
361
+ bases = [(16500000, 3.09), (20600000, 3.12), (4680000, 5.98), (4890000, 6.86), (6400000, 6.02)]
362
+ return [
363
+ {
364
+ "plant": p,
365
+ "current_unit_sales": int(b[0] * rng.uniform(0.85, 1.15)),
366
+ "price_per_unit": round(b[1] * rng.uniform(0.95, 1.05), 2),
367
+ }
368
+ for p, b in zip(plants, bases)
369
+ ]
370
+
371
+
372
+ def generate_aptean_report(cfg: WorldConfig, rng: random.Random):
373
+ """Aptean industry report data for Task 11."""
374
+ base = [
375
+ ("IoT Sensors", 12.5, 4.2, "Top Investment to Date"),
376
+ ("Predictive Maintenance", 11.8, 3.9, "Top Planned 2024"),
377
+ ("Cloud ERP", 9.2, 5.1, "Top Investment to Date"),
378
+ ("Robotic Automation", 10.4, 3.5, "Top Planned 2024"),
379
+ ("AI Quality Control", 8.7, 4.8, "Top Investment to Date"),
380
+ ("Digital Twin", 7.3, 4.0, "Other"),
381
+ ("Supply Chain AI", 6.9, 3.2, "Other"),
382
+ ("Automated Scheduling", 8.1, 5.5, "Top Planned 2024"),
383
+ ("Warehouse Robotics", 7.8, 5.0, "Other"),
384
+ ("Advanced Analytics", 9.8, 4.5, "Top Investment to Date"),
385
+ ]
386
+ noise = cfg.aptean_noise
387
+ return [
388
+ {
389
+ "technology": t,
390
+ "users_growth": round(u + rng.uniform(-noise, noise), 1),
391
+ "non_users_growth": round(n + rng.uniform(-noise, noise), 1),
392
+ "category": c,
393
+ }
394
+ for t, u, n, c in base
395
+ ]
396
+
397
+
398
+ # =============================================================================
399
+ # TEXT DOCUMENT GENERATORS
400
+ # =============================================================================
401
+
402
+ def generate_scrap_report(cfg: WorldConfig):
403
+ target = cfg.target_scrap_pct
404
+ rmax = cfg.scrap_range_max_pct
405
+ return f"""HarFeast Food Group - Quality Standards: Scrap Rate Report
406
+ ==========================================================
407
+
408
+ Acceptable scrap rate range: {target}% - {rmax}%
409
+ Target scrap rate (minimum of acceptable range): {target}%
410
+
411
+ Plants operating above {rmax}% require immediate corrective action and
412
+ must submit a remediation plan within 30 days. Quarterly reviews will
413
+ assess progress toward the target rate.
414
+
415
+ The target scrap rate represents the minimum of the acceptable range
416
+ and should be used as the baseline for all cost-of-quality calculations.
417
+ """
418
+
419
+
420
+ def generate_interviews():
421
+ interviews = {}
422
+
423
+ interviews["sarah_jenkins"] = """Expert Interview Transcript - Sarah Jenkins, VP Operations
424
+ Date: November 15, 2024
425
+
426
+ Q: Of the digital levers evaluated, which would deliver the fastest and
427
+ biggest boost to HarFeast's Gross Margin?
428
+
429
+ A: "We've evaluated several options including predictive maintenance,
430
+ automated scheduling, and IoT-based monitoring. In my assessment,
431
+ IoT Sensors for yield monitoring would deliver the fastest and most
432
+ significant boost to our Gross Margin. The real-time data on production
433
+ yield lets us catch quality issues at the source before they cascade
434
+ into scrap. I've seen it work at comparable food manufacturers with
435
+ measurable margin improvement within 6 months of deployment."
436
+ """
437
+
438
+ interviews["david_chen"] = """Expert Interview Transcript - David Chen, Director of Manufacturing
439
+ Date: November 16, 2024
440
+
441
+ Q: What digital investment would have the fastest and largest impact
442
+ on HarFeast's profitability?
443
+
444
+ A: "I've been looking at this from an operations standpoint. While
445
+ predictive maintenance is valuable long-term, the immediate winner
446
+ is IoT Sensors for yield optimization. The ability to monitor yield
447
+ in real-time across all product lines gives us immediate visibility
448
+ into where we're losing margin. Other levers like automated scheduling
449
+ help with throughput but don't directly attack gross margin the way
450
+ yield sensing does. IoT Sensors for yield is my top recommendation."
451
+ """
452
+
453
+ interviews["mike_russo"] = """Expert Interview Transcript - Mike Russo, Head of Digital Transformation
454
+ Date: November 17, 2024
455
+
456
+ Q: Which digital lever should HarFeast prioritize for the fastest
457
+ margin improvement?
458
+
459
+ A: "After analyzing all the options, I keep coming back to
460
+ IoT Sensors for yield. The ROI timeline is shortest — typically
461
+ 4-8 months to see measurable improvement. Predictive maintenance
462
+ is a close second but has a longer implementation cycle. Cloud ERP
463
+ is foundational but doesn't directly move gross margin in the near
464
+ term. IoT Sensors for yield monitoring is the clear priority if
465
+ we want the fastest and biggest boost to Gross Margin."
466
+ """
467
+
468
+ return interviews
469
+
470
+
471
+ def generate_frito_lay_case(cfg: WorldConfig):
472
+ pct = int(cfg.frito_lay_reduction_pct)
473
+ return f"""Frito-Lay Digital Transformation Case Study
474
+ =============================================
475
+
476
+ Background: Frito-Lay North America, a division of PepsiCo, operates
477
+ over 30 manufacturing facilities producing snack foods including
478
+ Doritos, Cheetos, and Lay's potato chips.
479
+
480
+ Initiative: In 2022, Frito-Lay deployed IoT-based predictive maintenance
481
+ sensors across their manufacturing network, focusing on high-throughput
482
+ production lines.
483
+
484
+ Results: After 18 months of deployment, Frito-Lay achieved a {pct}%
485
+ reduction in unplanned downtime across all monitored production lines.
486
+ The improvement was consistent across facilities regardless of size
487
+ or product type.
488
+
489
+ Key Success Factors:
490
+ - Phased rollout starting with highest-volume lines
491
+ - Integration with existing SCADA systems
492
+ - Dedicated data analytics team for sensor data interpretation
493
+ - Weekly review cadence with plant managers
494
+
495
+ The {pct}% unplanned downtime reduction translated to approximately
496
+ $45M in annual cost savings across the network.
497
+ """
498
+
499
+
500
+ def generate_aptean_report_text(aptean_data):
501
+ lines = ["Aptean Food & Beverage Manufacturing Technology Report 2024",
502
+ "=" * 60, "",
503
+ "Top Technology Investments and Revenue Impact Analysis", "",
504
+ f"{'Technology':<25} {'Users Growth':>15} {'Non-Users Growth':>18} {'Category':<28}",
505
+ "-" * 86]
506
+ for row in aptean_data:
507
+ lines.append(f"{row['technology']:<25} {row['users_growth']:>14.1f}% {row['non_users_growth']:>17.1f}% {row['category']:<28}")
508
+
509
+ lines.extend(["", "",
510
+ "Note: 'Top Investment to Date' and 'Top Planned 2024' represent",
511
+ "investments explicitly identified by surveyed manufacturers as",
512
+ "their highest-priority technology initiatives."])
513
+ return "\n".join(lines)
514
+
515
+
516
+ # =============================================================================
517
+ # GROUND TRUTH COMPUTATION
518
+ # =============================================================================
519
+
520
+ def median(values):
521
+ """Compute median of a list of numbers."""
522
+ s = sorted(values)
523
+ n = len(s)
524
+ if n == 0:
525
+ return 0
526
+ if n % 2 == 1:
527
+ return s[n // 2]
528
+ return (s[n // 2 - 1] + s[n // 2]) / 2
529
+
530
+
531
+ def percentile(values, p):
532
+ """Compute percentile using linear interpolation."""
533
+ s = sorted(values)
534
+ n = len(s)
535
+ if n == 0:
536
+ return 0
537
+ k = (n - 1) * p / 100
538
+ f = math.floor(k)
539
+ c = math.ceil(k)
540
+ if f == c:
541
+ return s[int(k)]
542
+ return s[f] * (c - k) + s[c] * (k - f)
543
+
544
+
545
+ def compute_ground_truth(employees, equipment, quality_losses, plant_labor,
546
+ bls_wages, attached_wages, oee_assumptions,
547
+ plant_sales, aptean_data, cfg: WorldConfig):
548
+ """Compute ground truth answers for all 14 tasks."""
549
+ truth = {}
550
+ PLANTS = list(cfg.plants)
551
+ target_scrap = cfg.target_scrap_pct / 100
552
+ frito_lay_mult = 1 - cfg.frito_lay_reduction_pct / 100
553
+
554
+ # =========================================================================
555
+ # TASK 1: High-priority employees for digital training rollout
556
+ # =========================================================================
557
+ # Conditions: above role_type median readiness, willing to pilot,
558
+ # >2 days training with dedicated time, above overall median comfort
559
+
560
+ overall_median_comfort = median([e["digital_comfort_score"] for e in employees])
561
+
562
+ role_type_readiness_medians = {}
563
+ by_rt = defaultdict(list)
564
+ for e in employees:
565
+ by_rt[e["role_type"]].append(e["digital_readiness_score"])
566
+ for rt, scores in by_rt.items():
567
+ role_type_readiness_medians[rt] = median(scores)
568
+
569
+ high_priority = []
570
+ for e in employees:
571
+ if (e["digital_readiness_score"] > role_type_readiness_medians[e["role_type"]]
572
+ and e["willing_to_pilot"] == "Yes"
573
+ and e["training_days_willing"] == ">2 days"
574
+ and e["dedicated_training_time"] == "Yes"
575
+ and e["digital_comfort_score"] > overall_median_comfort):
576
+ high_priority.append(e)
577
+
578
+ hp_count = len(high_priority)
579
+ hp_pct = round(hp_count / len(employees) * 100, 1)
580
+
581
+ hp_inefficient = sum(e["hours_manual_entry"] + e["hours_searching_data"] + e["hours_fixing_errors"] for e in high_priority)
582
+ total_inefficient = sum(e["hours_manual_entry"] + e["hours_searching_data"] + e["hours_fixing_errors"] for e in employees)
583
+ hp_inefficient_pct = round(hp_inefficient / total_inefficient * 100, 1) if total_inefficient > 0 else 0
584
+
585
+ hp_by_role_type = defaultdict(int)
586
+ for e in high_priority:
587
+ hp_by_role_type[e["role_type"]] += 1
588
+
589
+ truth["task1"] = {
590
+ "high_priority_count": hp_count,
591
+ "high_priority_pct": hp_pct,
592
+ "hp_inefficient_hours": round(hp_inefficient, 0),
593
+ "hp_inefficient_pct": hp_inefficient_pct,
594
+ "hp_frontline": hp_by_role_type.get("Front-line", 0),
595
+ "hp_backoffice": hp_by_role_type.get("Back-office/Support", 0),
596
+ "hp_supervisor": hp_by_role_type.get("Supervisor/Team Lead", 0),
597
+ "hp_management": hp_by_role_type.get("Management", 0),
598
+ }
599
+
600
+ # =========================================================================
601
+ # TASK 2: Adjusted Cost of Instability per plant
602
+ # =========================================================================
603
+ # Formula: Abnormal scrap cost / (Actual Scrap% - Target Scrap%)
604
+ plant_instability = {}
605
+ for plant in PLANTS:
606
+ plant_equip = [e for e in equipment if e["plant"] == plant]
607
+ total_abnormal_cost = 0
608
+ total_weighted_scrap = 0
609
+ total_units = 0
610
+
611
+ for eq in plant_equip:
612
+ if eq["scrap_rate"] > target_scrap:
613
+ abnormal = eq["cogs_per_ton"] * eq["units_produced"] * (eq["scrap_rate"] - target_scrap)
614
+ total_abnormal_cost += abnormal
615
+ total_weighted_scrap += eq["scrap_rate"] * eq["units_produced"]
616
+ total_units += eq["units_produced"]
617
+
618
+ avg_scrap = total_weighted_scrap / total_units if total_units > 0 else 0
619
+ denominator = avg_scrap - target_scrap
620
+
621
+ if denominator > 0:
622
+ adjusted_cost = round(total_abnormal_cost / denominator)
623
+ else:
624
+ adjusted_cost = 0
625
+
626
+ plant_instability[plant] = adjusted_cost
627
+
628
+ truth["task2"] = plant_instability
629
+
630
+ # =========================================================================
631
+ # TASK 3: Predictive maintenance impact on scrap rate
632
+ # =========================================================================
633
+ # Pilot on equipment where: scheduled_hours >= equipment_type median
634
+ # AND labor_hours >= plant median labor hours
635
+ # Apply 15% scrap reduction to qualifying equipment
636
+
637
+ # Equipment type median scheduled hours
638
+ by_type = defaultdict(list)
639
+ for eq in equipment:
640
+ by_type[eq["equipment_type"]].append(eq["scheduled_hours"])
641
+ type_median_scheduled = {t: median(hrs) for t, hrs in by_type.items()}
642
+
643
+ # Plant median labor hours
644
+ by_plant_labor = defaultdict(list)
645
+ for eq in equipment:
646
+ by_plant_labor[eq["plant"]].append(eq["labor_hours"])
647
+ plant_median_labor = {p: median(hrs) for p, hrs in by_plant_labor.items()}
648
+
649
+ scrap_reduction = 0.15 # 15% reduction for qualifying equipment
650
+
651
+ # Compute new scrap rates by product family
652
+ pf_data = defaultdict(lambda: {"total_units": 0, "total_scrap_units": 0})
653
+ for eq in equipment:
654
+ qualifies = (eq["scheduled_hours"] >= type_median_scheduled[eq["equipment_type"]]
655
+ and eq["labor_hours"] >= plant_median_labor[eq["plant"]])
656
+
657
+ scrap_units = eq["units_produced"] * eq["scrap_rate"]
658
+ if qualifies:
659
+ scrap_units *= (1 - scrap_reduction)
660
+
661
+ pf_data[eq["product_family"]]["total_units"] += eq["units_produced"]
662
+ pf_data[eq["product_family"]]["total_scrap_units"] += scrap_units
663
+
664
+ # Also compute original scrap units for avoidance calc
665
+ pf_original = defaultdict(lambda: {"total_units": 0, "total_scrap_units": 0})
666
+ for eq in equipment:
667
+ pf_original[eq["product_family"]]["total_units"] += eq["units_produced"]
668
+ pf_original[eq["product_family"]]["total_scrap_units"] += eq["units_produced"] * eq["scrap_rate"]
669
+
670
+ task3 = {}
671
+ for pf in PRODUCT_FAMILIES:
672
+ new_rate = round(pf_data[pf]["total_scrap_units"] / pf_data[pf]["total_units"] * 100, 1)
673
+ avoided = round(pf_original[pf]["total_scrap_units"] - pf_data[pf]["total_scrap_units"])
674
+ task3[pf] = {"new_scrap_rate_pct": new_rate, "units_avoided": avoided}
675
+
676
+ truth["task3"] = task3
677
+
678
+ # =========================================================================
679
+ # TASK 4: Digital lever agreement + OEE projections
680
+ # =========================================================================
681
+ # Lever is "IoT Sensors for yield" (from interviews)
682
+ # Project OEE per plant until it exceeds world-class target
683
+
684
+ task4 = {"digital_lever": "IoT Sensors for yield"}
685
+ for oee_row in oee_assumptions:
686
+ plant = oee_row["plant"]
687
+ oee = oee_row["current_annual_oee"]
688
+ improvement = oee_row["annual_oee_improvement"]
689
+ start_year = oee_row["investment_start_year"]
690
+ target = oee_row["world_class_oee_target"]
691
+
692
+ year = start_year
693
+ while oee < target and year < 2040:
694
+ oee += improvement
695
+ year += 1
696
+
697
+ if oee >= target:
698
+ task4[plant] = {
699
+ "first_year_exceeds": year,
700
+ "oee_at_that_year": round(oee, 4)
701
+ }
702
+
703
+ truth["task4"] = task4
704
+
705
+ # =========================================================================
706
+ # TASK 5: Total labor cost, efficiency gains, union demand
707
+ # =========================================================================
708
+
709
+ task5 = {}
710
+ for plant in PLANTS[:3]: # Only IL, WI, IA
711
+ plant_workers = [w for w in plant_labor if w["plant"] == plant]
712
+
713
+ # Total annual labor cost
714
+ total_cost = sum(w["hourly_wage"] * w["annual_hours"] for w in plant_workers)
715
+ total_cost = round(total_cost)
716
+
717
+ # Efficiency gains: 10% for West North Central, 20% for others
718
+ # But 5% for non-unionized production supervisors regardless
719
+ efficiency = 0
720
+ for w in plant_workers:
721
+ if w["union_status"] == "Non-Union" and w["supervisor_type"] == "production":
722
+ rate = 0.05
723
+ elif w["census_division"] == "West North Central":
724
+ rate = 0.10
725
+ else:
726
+ rate = 0.20
727
+ efficiency += w["hourly_wage"] * w["annual_hours"] * rate
728
+ efficiency = round(efficiency)
729
+
730
+ # Union demand: 5% increase for union workers
731
+ union_increase = sum(
732
+ w["hourly_wage"] * w["annual_hours"] * 0.05
733
+ for w in plant_workers if w["union_status"] == "Union"
734
+ )
735
+ union_increase = round(union_increase)
736
+
737
+ task5[plant] = {
738
+ "total_labor_cost": total_cost,
739
+ "efficiency_gains": efficiency,
740
+ "union_demand_increase": union_increase,
741
+ }
742
+
743
+ truth["task5"] = task5
744
+
745
+ # =========================================================================
746
+ # TASK 6: Average inefficient hours per plant
747
+ # =========================================================================
748
+
749
+ plant_inefficient = defaultdict(list)
750
+ for e in employees:
751
+ total = e["hours_manual_entry"] + e["hours_searching_data"] + e["hours_fixing_errors"]
752
+ plant_inefficient[e["plant"]].append(total)
753
+
754
+ task6 = {}
755
+ for plant in PLANTS:
756
+ avg = round(sum(plant_inefficient[plant]) / len(plant_inefficient[plant]), 1)
757
+ task6[plant] = avg
758
+
759
+ sorted_plants = sorted(task6.items(), key=lambda x: x[1])
760
+ most_efficient = [p for p, v in sorted_plants if v == sorted_plants[0][1]]
761
+ least_efficient = sorted_plants[-1][0]
762
+ least_val = sorted_plants[-1][1]
763
+ most_val = sorted_plants[0][1]
764
+ pct_diff = round((least_val - most_val) / most_val * 100)
765
+
766
+ truth["task6"] = {
767
+ "avg_by_plant": task6,
768
+ "most_efficient": most_efficient,
769
+ "least_efficient": least_efficient,
770
+ "pct_difference": pct_diff,
771
+ }
772
+
773
+ # =========================================================================
774
+ # TASK 7: Average annual productivity loss per role
775
+ # =========================================================================
776
+ # Survey = 1 week. Annual = multiply by 52.
777
+
778
+ role_losses = defaultdict(list)
779
+ for e in employees:
780
+ weekly_inefficient = e["hours_manual_entry"] + e["hours_searching_data"] + e["hours_fixing_errors"]
781
+ annual_loss = weekly_inefficient * 52 * e["hourly_wage"]
782
+ role_losses[e["role"]].append(annual_loss)
783
+
784
+ task7 = {}
785
+ total_annual_loss = 0
786
+ for role in ROLES:
787
+ if role in role_losses:
788
+ avg = round(sum(role_losses[role]) / len(role_losses[role]))
789
+ task7[role] = avg
790
+ total_annual_loss += sum(role_losses[role])
791
+
792
+ truth["task7"] = {
793
+ "avg_loss_by_role": task7,
794
+ "total_annual_loss": round(total_annual_loss),
795
+ }
796
+
797
+ # =========================================================================
798
+ # TASK 8: High-priority canned vegetables equipment quality losses
799
+ # =========================================================================
800
+ # High-priority: canned veg with scrap_rate > 5% AND
801
+ # unplanned_downtime_hours > plant median for canned veg
802
+
803
+ # Plant median downtime for canned vegetables
804
+ cv_by_plant = defaultdict(list)
805
+ for eq in equipment:
806
+ if eq["product_family"] == "Canned Vegetables":
807
+ cv_by_plant[eq["plant"]].append(eq["unplanned_downtime_hours"])
808
+ cv_plant_median = {p: median(hrs) for p, hrs in cv_by_plant.items()}
809
+
810
+ hp_equip_ids = set()
811
+ for eq in equipment:
812
+ if (eq["product_family"] == "Canned Vegetables"
813
+ and eq["scrap_rate"] > 0.05
814
+ and eq["unplanned_downtime_hours"] > cv_plant_median.get(eq["plant"], 0)):
815
+ hp_equip_ids.add(eq["equipment_id"])
816
+
817
+ hp_quality_loss = 0
818
+ total_cv_quality_loss = 0
819
+ for ql in quality_losses:
820
+ if ql["product_family"] == "Canned Vegetables":
821
+ loss = ql["scrap_cost"] + ql["unplanned_failure_cost"]
822
+ total_cv_quality_loss += loss
823
+ if ql["equipment_id"] in hp_equip_ids:
824
+ hp_quality_loss += loss
825
+
826
+ hp_pct_of_cv = round(hp_quality_loss / total_cv_quality_loss * 100) if total_cv_quality_loss > 0 else 0
827
+
828
+ truth["task8"] = {
829
+ "hp_quality_losses": round(hp_quality_loss),
830
+ "hp_pct_of_cv_losses": hp_pct_of_cv,
831
+ }
832
+
833
+ # =========================================================================
834
+ # TASK 9: Labor variance for IL and WI plants
835
+ # =========================================================================
836
+ # Variance = Standard Hours - Actual Hours (positive = favorable)
837
+ # Dollar variance = Hours variance * BLS median wage
838
+ # Productivity Index = Actual Hours / Standard Hours
839
+
840
+ bls_all_occ_wage = next(w["median_hourly_wage"] for w in bls_wages if w["occupation"] == "All Occupations")
841
+
842
+ task9 = {}
843
+ for plant in PLANTS[:2]:
844
+ plant_equip = [eq for eq in equipment if eq["plant"] == plant]
845
+ total_standard = sum(eq["standard_hours"] for eq in plant_equip)
846
+ total_actual = sum(eq["actual_hours"] for eq in plant_equip)
847
+
848
+ variance_hours = round(total_standard - total_actual, 2)
849
+ variance_dollars = round(variance_hours * bls_all_occ_wage, 2)
850
+ productivity_index = round(total_actual / total_standard, 2) if total_standard > 0 else 0
851
+
852
+ task9[plant] = {
853
+ "variance_hours": variance_hours,
854
+ "variance_dollars": variance_dollars,
855
+ "productivity_index": productivity_index,
856
+ }
857
+
858
+ truth["task9"] = task9
859
+
860
+ # =========================================================================
861
+ # TASK 10: Updated productivity loss with attached wages
862
+ # =========================================================================
863
+ # Use attached wage data to get average hourly salary across all roles
864
+ # Then recompute annual productivity loss
865
+
866
+ avg_hourly = round(sum(w["avg_hourly_salary"] for w in attached_wages) / len(attached_wages), 2)
867
+
868
+ total_weekly_inefficient = sum(
869
+ e["hours_manual_entry"] + e["hours_searching_data"] + e["hours_fixing_errors"]
870
+ for e in employees
871
+ )
872
+ annual_loss = round(total_weekly_inefficient * 52 * avg_hourly / 1000) * 1000 # in 000s
873
+
874
+ truth["task10"] = {
875
+ "avg_hourly_wage": avg_hourly,
876
+ "annual_productivity_loss": annual_loss,
877
+ }
878
+
879
+ # =========================================================================
880
+ # TASK 11: Top 5 tech investments applied to plant sales
881
+ # =========================================================================
882
+ # Filter aptean: only "Top Investment to Date" or "Top Planned 2024"
883
+ # Compute difference: users_growth - non_users_growth
884
+ # Take top 5 by difference
885
+ # Apply cumulative growth to each plant's unit sales
886
+
887
+ eligible = [a for a in aptean_data if a["category"] in ["Top Investment to Date", "Top Planned 2024"]]
888
+ for a in eligible:
889
+ a["growth_diff"] = a["users_growth"] - a["non_users_growth"]
890
+ eligible.sort(key=lambda x: x["growth_diff"], reverse=True)
891
+ top5 = eligible[:5]
892
+
893
+ # Total growth multiplier = product of (1 + diff/100) for all 5
894
+ total_growth = 1.0
895
+ for tech in top5:
896
+ total_growth *= (1 + tech["growth_diff"] / 100)
897
+
898
+ task11 = {}
899
+ for ps in plant_sales:
900
+ new_units = round(ps["current_unit_sales"] * total_growth)
901
+ new_revenue = round(new_units * ps["price_per_unit"])
902
+ task11[ps["plant"]] = {
903
+ "new_unit_sales": new_units,
904
+ "new_projected_sales": new_revenue,
905
+ }
906
+
907
+ truth["task11"] = {
908
+ "top5_technologies": [t["technology"] for t in top5],
909
+ "plant_results": task11,
910
+ }
911
+
912
+ # =========================================================================
913
+ # TASK 12: Willingness to adopt by plant and role, training costs
914
+ # =========================================================================
915
+
916
+ # Plant-level willingness
917
+ plant_willingness = {}
918
+ for plant in PLANTS:
919
+ scores = [e["willingness_to_adopt"] for e in employees if e["plant"] == plant]
920
+ plant_willingness[plant] = round(sum(scores) / len(scores), 2)
921
+
922
+ sorted_pw = sorted(plant_willingness.items(), key=lambda x: x[1])
923
+ lowest_plant = sorted_pw[0][0]
924
+ highest_plant = sorted_pw[-1][0]
925
+
926
+ # Role willingness within those plants
927
+ def role_willingness_in_plant(plant):
928
+ by_role = defaultdict(list)
929
+ for e in employees:
930
+ if e["plant"] == plant:
931
+ by_role[e["role"]].append(e["willingness_to_adopt"])
932
+ return {r: round(sum(s)/len(s), 2) for r, s in by_role.items()}
933
+
934
+ lowest_plant_roles = role_willingness_in_plant(lowest_plant)
935
+ highest_plant_roles = role_willingness_in_plant(highest_plant)
936
+
937
+ lowest_role_in_lowest = min(lowest_plant_roles.items(), key=lambda x: x[1])
938
+ highest_role_in_highest = max(highest_plant_roles.items(), key=lambda x: x[1])
939
+
940
+ # Training preferences and costs
941
+ # Preferred training: most common training_days_willing for each role in each plant
942
+ def training_info(plant, role):
943
+ emps = [e for e in employees if e["plant"] == plant and e["role"] == role]
944
+ if not emps:
945
+ return {"preferred_length": "N/A", "count_1_2_days": 0, "total_cost": 0}
946
+
947
+ prefs = defaultdict(int)
948
+ for e in emps:
949
+ prefs[e["training_days_willing"]] += 1
950
+
951
+ preferred = max(prefs.items(), key=lambda x: x[1])[0]
952
+ count_1_2 = prefs.get("1-2 days", 0)
953
+
954
+ # Training cost: $8/hour * hours based on preference
955
+ hours_map = {"<1 day": 4, "1-2 days": 12, ">2 days": 20}
956
+ cost_per_person = 8 * hours_map.get(preferred, 12) # $8/hr training cost
957
+ total_cost = round(cost_per_person * len(emps))
958
+
959
+ return {"preferred_length": preferred, "count_1_2_days": count_1_2, "total_cost": total_cost}
960
+
961
+ truth["task12"] = {
962
+ "lowest_willingness_plant": lowest_plant,
963
+ "highest_willingness_plant": highest_plant,
964
+ "lowest_role_in_lowest_plant": lowest_role_in_lowest,
965
+ "highest_role_in_highest_plant": highest_role_in_highest,
966
+ "training_details": {
967
+ "lowest_plant_lowest_role": training_info(lowest_plant, lowest_role_in_lowest[0]),
968
+ "highest_plant_highest_role": training_info(highest_plant, highest_role_in_highest[0]),
969
+ }
970
+ }
971
+
972
+ # =========================================================================
973
+ # TASK 13: Apply Frito-Lay downtime reduction
974
+ # =========================================================================
975
+ # 30% reduction in unplanned downtime per plant
976
+
977
+ task13 = {}
978
+ for plant in PLANTS:
979
+ plant_equip = [eq for eq in equipment if eq["plant"] == plant]
980
+ total_scheduled = sum(eq["scheduled_hours"] for eq in plant_equip)
981
+ total_downtime = sum(eq["unplanned_downtime_hours"] for eq in plant_equip)
982
+
983
+ current_ratio = total_downtime / total_scheduled if total_scheduled > 0 else 0
984
+ new_ratio = current_ratio * frito_lay_mult
985
+ task13[plant] = round(new_ratio * 100) # nearest full percentage point
986
+
987
+ truth["task13"] = task13
988
+
989
+ # =========================================================================
990
+ # TASK 14: Training quality breakdown
991
+ # =========================================================================
992
+
993
+ trained = [e for e in employees if e["training_received"] == "Yes"]
994
+ trained_count = len(trained)
995
+
996
+ quality_counts = defaultdict(int)
997
+ for e in trained:
998
+ quality_counts[e["training_quality"]] += 1
999
+
1000
+ quality_pcts = {}
1001
+ for q in TRAINING_QUALITY_OPTIONS:
1002
+ quality_pcts[q] = round(quality_counts[q] / trained_count * 100)
1003
+
1004
+ truth["task14"] = {
1005
+ "trained_count": trained_count,
1006
+ "quality_pcts": quality_pcts,
1007
+ }
1008
+
1009
+ return truth
1010
+
1011
+
1012
+ # =============================================================================
1013
+ # TASK PROMPT GENERATION
1014
+ # =============================================================================
1015
+
1016
+ def generate_task_prompts(truth, cfg: WorldConfig):
1017
+ """Generate task prompts adapted to the synthetic world."""
1018
+ tasks = []
1019
+ PLANTS = list(cfg.plants)
1020
+ plants_il_wi_ia = ", ".join(PLANTS[:3])
1021
+
1022
+ # TASK 1
1023
+ tasks.append({
1024
+ "task_id": "task_01",
1025
+ "task_name": "High-Priority Digital Training Employees",
1026
+ "prompt": """I'm trying to get a sense of which HarFeast employees are most ready for the digital training rollout. Can you pull the workforce survey data and identify all employees who are above their role type's median readiness score, willing to pilot new tools, willing to spend >2 days in training with dedicated training time, and above the overall median digital comfort score?
1027
+
1028
+ Once you've identified that group, tell me:
1029
+ 1. How many "high-priority" employees are there, and what % of total employees do they represent?
1030
+ 2. How many total hours does this group spend weekly on manual entry, searching data, or fixing errors? What % of the company-wide total is that?
1031
+ 3. Break down the high-priority count by role type.
1032
+
1033
+ Report your answer here.""",
1034
+ "ground_truth": truth["task1"],
1035
+ "rubric": [
1036
+ f"States that the number of high-priority employees is {truth['task1']['high_priority_count']}",
1037
+ f"States that the percentage of all employees the high-priority employees represent is {truth['task1']['high_priority_pct']}%",
1038
+ f"States that the total hours high-priority employees spend on manual entry, searching data or fixing errors is {truth['task1']['hp_inefficient_hours']:.0f}",
1039
+ f"States that the percentage of all such hours from high-priority employees is {truth['task1']['hp_inefficient_pct']}%",
1040
+ f"States that the number of high-priority employees in the Front-line role type is {truth['task1']['hp_frontline']}",
1041
+ f"States that the number of high-priority employees in the Back-office/Support role type is {truth['task1']['hp_backoffice']}",
1042
+ f"States that the number of high-priority employees in the Supervisor/Team Lead role type is {truth['task1']['hp_supervisor']}",
1043
+ f"States that the number of high-priority employees in the Management role type is {truth['task1']['hp_management']}",
1044
+ ]
1045
+ })
1046
+
1047
+ # TASK 2
1048
+ rubric2 = [f"States that the adjusted cost of instability for {plant} is ${cost:,}" for plant, cost in truth["task2"].items()]
1049
+ tasks.append({
1050
+ "task_id": "task_02",
1051
+ "task_name": "Adjusted Cost of Instability",
1052
+ "prompt": """Calculate the Adjusted Cost of Instability for each site, defined as Abnormal scrap cost/(Actual Scrap % - Normal Scrap %) = adjusted cost of instability. The target scrap rate of HarFeast is the minimum in the range of acceptable scrap rate in the scrap rate report. Just use COGS per ton as your scrap cost for now.
1053
+
1054
+ Report your final answers to me in a message. Round values to the nearest dollar.""",
1055
+ "ground_truth": truth["task2"],
1056
+ "rubric": rubric2,
1057
+ })
1058
+
1059
+ # TASK 3
1060
+ rubric3 = []
1061
+ for pf in PRODUCT_FAMILIES:
1062
+ rubric3.append(f"States that the new overall scrap rate for {pf} is {truth['task3'][pf]['new_scrap_rate_pct']}%")
1063
+ rubric3.append(f"States that the scrap units {pf} avoids per year is {truth['task3'][pf]['units_avoided']}")
1064
+ tasks.append({
1065
+ "task_id": "task_03",
1066
+ "task_name": "Predictive Maintenance Scrap Impact",
1067
+ "prompt": """Using HarFeast's equipment data, assess the impact of predictive maintenance on HarFeast's scrap rate. We will pilot predictive maintenance only on equipment a) whose scheduled hours per year are at or above that equipment type's median scheduled hours and b) whose labor hours are at or above its plant's median labor hours. For all equipment qualifying for the pilot, apply a 15% reduction to their scrap rate.
1068
+
1069
+ Calculate:
1070
+ 1. The new overall scrap rate for each product family (as a %)
1071
+ 2. The total number of scrap units each product family avoids every year
1072
+
1073
+ Report rounded to 1 decimal place for rates and nearest whole number for units.""",
1074
+ "ground_truth": truth["task3"],
1075
+ "rubric": rubric3,
1076
+ })
1077
+
1078
+ # TASK 4
1079
+ rubric4 = [f"States that the digital lever is IoT Sensors for yield"]
1080
+ for plant, data in truth["task4"].items():
1081
+ if plant == "digital_lever":
1082
+ continue
1083
+ rubric4.append(f"States that the OEE level for {plant} in the first year exceeding world-class target is {data['oee_at_that_year']:.2%}")
1084
+ rubric4.append(f"States that the first year {plant} exceeds world-class target is {data['first_year_exceeds']}")
1085
+ tasks.append({
1086
+ "task_id": "task_04",
1087
+ "task_name": "Digital Lever Agreement and OEE Projections",
1088
+ "prompt": """1. What is the digital lever that Sarah Jenkins, David Chen, and Mike Russo agree will deliver the fastest and biggest boost to HarFeast's Gross Margin?
1089
+
1090
+ 2. Assuming HarFeast adopts the chosen digital lever, determine the OEE level in the first full year in each plant location where the annual OEE value exceeds the world-class target. Use the OEE improvement assumptions file for growth rates and start dates.
1091
+
1092
+ Report OEE values to 2 decimal places as percentages.""",
1093
+ "ground_truth": truth["task4"],
1094
+ "rubric": rubric4,
1095
+ })
1096
+
1097
+ # TASK 5
1098
+ rubric5 = []
1099
+ for plant, data in truth["task5"].items():
1100
+ rubric5.append(f"States that the Total Annual Labor Cost for {plant} is ${data['total_labor_cost']:,}")
1101
+ rubric5.append(f"States that the Efficiency Gains for {plant} is ${data['efficiency_gains']:,}")
1102
+ rubric5.append(f"States that the Union Demand Increase for {plant} is ${data['union_demand_increase']:,}")
1103
+ tasks.append({
1104
+ "task_id": "task_05",
1105
+ "task_name": "Labor Cost Analysis",
1106
+ "prompt": f"""1. Give me the total labor cost for each plant location ({plants_il_wi_ia} only).
1107
+
1108
+ 2. Give me the efficiency gains for each plant location. West North Central division plant locations only have a 10% annual efficiency gain from labor cost. For other locations, the efficiency gain is 20%. However, the efficiency gain is 5% for non-unionized production supervisors no matter where they are located.
1109
+
1110
+ 3. Give me the forecasted labor cost increase from union demands, assuming a 5% increase for all union workers.
1111
+
1112
+ Round to the nearest dollar.""",
1113
+ "ground_truth": truth["task5"],
1114
+ "rubric": rubric5,
1115
+ })
1116
+
1117
+ # TASK 6
1118
+ rubric6 = [f"States the average inefficient time in {plant} is {val}" for plant, val in truth["task6"]["avg_by_plant"].items()]
1119
+ for p in truth["task6"]["most_efficient"]:
1120
+ rubric6.append(f"States that {p} is a plant with the lowest average inefficient time")
1121
+ rubric6.append(f"States that {truth['task6']['least_efficient']} is the plant with the highest average inefficient time")
1122
+ rubric6.append(f"States that the difference between highest and lowest average inefficient time is {truth['task6']['pct_difference']}%")
1123
+ tasks.append({
1124
+ "task_id": "task_06",
1125
+ "task_name": "Operational Efficiency Analysis",
1126
+ "prompt": """Analyze the operational efficiency at HarFeast and assess how many inefficient employee hours each plant is recording on average. Which plants have the most efficient operations and the least efficient operations? How much more efficient are the highest efficiency locations vs the lowest efficiency locations?
1127
+
1128
+ Assume the following activities are considered inefficient: (a) manual data entry, (b) searching for data, (c) fixing errors. Use the workforce survey data. Report averages to 1 decimal place.""",
1129
+ "ground_truth": truth["task6"],
1130
+ "rubric": rubric6,
1131
+ })
1132
+
1133
+ # TASK 7
1134
+ rubric7 = [f"States the average annual productivity loss cost of a {role} employee is ${loss:,}" for role, loss in truth["task7"]["avg_loss_by_role"].items()]
1135
+ rubric7.append(f"States the total annual productivity loss cost is ${truth['task7']['total_annual_loss']:,}")
1136
+ tasks.append({
1137
+ "task_id": "task_07",
1138
+ "task_name": "Productivity Loss Quantification",
1139
+ "prompt": """I want to quantify the average annual productivity loss at a cost level for each employee in each primary role based on the sum of average hours spent doing manual entry, searching data, and fixing errors. Then, I want to calculate the total productivity loss cost HarFeast faces every year, company-wide.
1140
+
1141
+ Note that the survey responses represent one week of work. Report your final answer as a message. Round to the nearest dollar.""",
1142
+ "ground_truth": truth["task7"],
1143
+ "rubric": rubric7,
1144
+ })
1145
+
1146
+ # TASK 8
1147
+ tasks.append({
1148
+ "task_id": "task_08",
1149
+ "task_name": "High-Priority Equipment Quality Losses",
1150
+ "prompt": """Using HarFeast's equipment data and quality losses dataset, consider all canned vegetables assets with a scrap rate > 5% and with unplanned downtime hours above the plant median for canned vegetables as "high-priority".
1151
+
1152
+ 1. For the "high-priority" group, calculate the total annual quality-related losses (scrap cost + unplanned failure cost).
1153
+ 2. What percentage of all canned-vegetable quality losses comes from these high-priority assets?
1154
+
1155
+ Report losses rounded to the nearest dollar and percentage to the nearest whole number.""",
1156
+ "ground_truth": truth["task8"],
1157
+ "rubric": [
1158
+ f"States that the total annual quality-related losses for the high-priority group is ${truth['task8']['hp_quality_losses']:,}",
1159
+ f"States that the percentage of all canned-vegetable quality losses from high-priority assets is {truth['task8']['hp_pct_of_cv_losses']}%",
1160
+ ]
1161
+ })
1162
+
1163
+ # TASK 9
1164
+ rubric9 = []
1165
+ for plant, data in truth["task9"].items():
1166
+ rubric9.append(f"States that the Labor Efficiency Variance (Hours) for {plant} is {data['variance_hours']} hours")
1167
+ rubric9.append(f"States that the Labor Cost Variance for {plant} is ${data['variance_dollars']}")
1168
+ rubric9.append(f"States that the Productivity Index for {plant} is {data['productivity_index']}")
1169
+ tasks.append({
1170
+ "task_id": "task_09",
1171
+ "task_name": "Labor Variance Analysis",
1172
+ "prompt": f"""Calculate the total labor variance in hours (favorable should be positive) and dollars for the Illinois and Wisconsin plants ({PLANTS[0]} and {PLANTS[1]}). A positive variance means Total Actual Hours are less than Total Standard Hours. Use the median wage for All Occupations in the food manufacturing industry from the BLS wage benchmark file to convert from hours to dollars.
1173
+
1174
+ Also give me the straight productivity index (Actual Hours / Standard Hours) for each plant.
1175
+
1176
+ Round hours to 2 decimal places, dollars to 2 decimal places, and the index to 2 decimal places.""",
1177
+ "ground_truth": truth["task9"],
1178
+ "rubric": rubric9,
1179
+ })
1180
+
1181
+ # TASK 10
1182
+ tasks.append({
1183
+ "task_id": "task_10",
1184
+ "task_name": "Updated Productivity Loss with New Wages",
1185
+ "prompt": """The client sent us employee wage data (attached), so we need to update our assumptions. Find the average hourly salary across all employee roles in the attached wage file and use that to calculate the updated annual productivity loss for the entire company.
1186
+
1187
+ Note that survey responses represent one week of work. Report the annual productivity loss in thousands (000s) rounded to the nearest thousand. Also state the average hourly wage used.
1188
+
1189
+ Report your answer here.""",
1190
+ "ground_truth": truth["task10"],
1191
+ "rubric": [
1192
+ f"States the updated annual productivity loss is ${truth['task10']['annual_productivity_loss']:,}",
1193
+ f"States the average fully-loaded hourly wage is ${truth['task10']['avg_hourly_wage']}",
1194
+ ]
1195
+ })
1196
+
1197
+ # TASK 11
1198
+ rubric11 = []
1199
+ for plant, data in truth["task11"]["plant_results"].items():
1200
+ rubric11.append(f"States that the unit sales for {plant} after deploying initiatives is {data['new_unit_sales']:,}")
1201
+ rubric11.append(f"States that the Revised Projected Sales for {plant} is ${data['new_projected_sales']:,}")
1202
+ tasks.append({
1203
+ "task_id": "task_11",
1204
+ "task_name": "Technology Investment Impact",
1205
+ "prompt": """Identify the top five technology investments from the Aptean report with the largest positive difference in percentage revenue growth between users and non-users. Include only investments that the report explicitly identifies as either top technology investments to date or top investments planned for 2024.
1206
+
1207
+ Next, assume that HarFeast will deploy all five of these top initiatives at every plant location. Apply the cumulative growth impact to each plant's current unit sales and calculate the revised projected sales revenue.
1208
+
1209
+ Round unit sales to the nearest whole number and revenue to the nearest dollar.""",
1210
+ "ground_truth": truth["task11"],
1211
+ "rubric": rubric11,
1212
+ })
1213
+
1214
+ # TASK 12
1215
+ t12 = truth["task12"]
1216
+ tasks.append({
1217
+ "task_id": "task_12",
1218
+ "task_name": "Digital Adoption Willingness Analysis",
1219
+ "prompt": """To implement the required roadmap, we need to identify what roles and plants are most and least willing to go through a digital transformation.
1220
+
1221
+ Determine the plant with the highest and lowest average willingness to adopt digital tools. Within those plants, identify the roles with the highest and lowest willingness. For those specific role-plant combinations, determine the preferred training length, the count of employees preferring 1-2 days of training, and the total training cost (at $8/hour training rate).
1222
+
1223
+ Report your findings here.""",
1224
+ "ground_truth": truth["task12"],
1225
+ "rubric": [
1226
+ f"States that the plant with lowest willingness to adopt is {t12['lowest_willingness_plant']}",
1227
+ f"States that the plant with highest willingness to adopt is {t12['highest_willingness_plant']}",
1228
+ f"States the role with lowest willingness in {t12['lowest_willingness_plant']} is {t12['lowest_role_in_lowest_plant'][0]}",
1229
+ f"States the role with highest willingness in {t12['highest_willingness_plant']} is {t12['highest_role_in_highest_plant'][0]}",
1230
+ ]
1231
+ })
1232
+
1233
+ # TASK 13
1234
+ rubric13 = [f"States that the new unplanned downtime ratio for {plant} is {pct}%" for plant, pct in truth["task13"].items()]
1235
+ tasks.append({
1236
+ "task_id": "task_13",
1237
+ "task_name": "Frito-Lay Downtime Reduction Application",
1238
+ "prompt": """Can you look at the Frito-Lay case study and apply their downtime reduction to HarFeast's numbers in the equipment data? I want to estimate what the improvement would look like for us (rounded to the nearest full percentage point).
1239
+
1240
+ Calculate the current unplanned downtime ratio (unplanned downtime hours / scheduled hours) for each plant, apply the reduction from the case study, and report the new ratios.
1241
+
1242
+ Output the information in a message here.""",
1243
+ "ground_truth": truth["task13"],
1244
+ "rubric": rubric13,
1245
+ })
1246
+
1247
+ # TASK 14
1248
+ rubric14 = [f"States that the number of respondents who received training is {truth['task14']['trained_count']}"]
1249
+ for quality, pct in truth["task14"]["quality_pcts"].items():
1250
+ rubric14.append(f"States that percentage of respondents rated training as \"{quality}\" is {pct}%")
1251
+ tasks.append({
1252
+ "task_id": "task_14",
1253
+ "task_name": "Training Quality Assessment",
1254
+ "prompt": """Use the workforce survey responses to identify the number of respondents who received any kind of training on digital tools. Of those respondents, return the percentage of respondents for each training quality rating.
1255
+
1256
+ Reply back here to me.""",
1257
+ "ground_truth": truth["task14"],
1258
+ "rubric": rubric14,
1259
+ })
1260
+
1261
+ return tasks
1262
+
1263
+
1264
+ # =============================================================================
1265
+ # FILE WRITERS
1266
+ # =============================================================================
1267
+
1268
+ def write_csv(filepath, data, fieldnames=None):
1269
+ """Write a list of dicts to CSV."""
1270
+ if not data:
1271
+ return
1272
+ if fieldnames is None:
1273
+ fieldnames = list(data[0].keys())
1274
+ with open(filepath, "w", newline="") as f:
1275
+ writer = csv.DictWriter(f, fieldnames=fieldnames)
1276
+ writer.writeheader()
1277
+ writer.writerows(data)
1278
+
1279
+
1280
+ def write_text(filepath, content):
1281
+ """Write text content to a file."""
1282
+ with open(filepath, "w") as f:
1283
+ f.write(content)
1284
+
1285
+
1286
+ # =============================================================================
1287
+ # MAIN
1288
+ # =============================================================================
1289
+
1290
+ def generate_world(
1291
+ seed: int = 42,
1292
+ output_dir: str = "./harfeast_world",
1293
+ config: Optional[WorldConfig] = None,
1294
+ ) -> tuple:
1295
+ """Generate the complete HarFeast synthetic world."""
1296
+ rng = random.Random(seed)
1297
+ cfg = config or sample_world_config(rng, seed)
1298
+ cfg.seed = seed
1299
+
1300
+ os.makedirs(output_dir, exist_ok=True)
1301
+ os.makedirs(os.path.join(output_dir, "data"), exist_ok=True)
1302
+ os.makedirs(os.path.join(output_dir, "documents"), exist_ok=True)
1303
+
1304
+ print(f"Generating world (seed={seed}, n_employees={cfg.n_employees}, plants={cfg.plants[0][:15]}...)...")
1305
+
1306
+ # Generate all datasets
1307
+ employees = generate_employee_survey(rng, cfg)
1308
+ equipment = generate_equipment_data(rng, cfg)
1309
+ quality_losses = generate_quality_losses(rng, equipment)
1310
+ plant_labor = generate_plant_labor(rng, cfg)
1311
+ bls_wages = generate_bls_wages(cfg)
1312
+ attached_wages = generate_attached_wages(cfg)
1313
+ oee_assumptions = generate_oee_assumptions(cfg, rng)
1314
+ plant_sales = generate_plant_sales(cfg, rng)
1315
+ aptean_data = generate_aptean_report(cfg, rng)
1316
+
1317
+ # Write CSV files
1318
+ write_csv(os.path.join(output_dir, "data", "employee_survey.csv"), employees)
1319
+ write_csv(os.path.join(output_dir, "data", "equipment_data.csv"), equipment)
1320
+ write_csv(os.path.join(output_dir, "data", "quality_losses.csv"), quality_losses)
1321
+ write_csv(os.path.join(output_dir, "data", "plant_labor.csv"), plant_labor)
1322
+ write_csv(os.path.join(output_dir, "data", "bls_wage_benchmark.csv"), bls_wages)
1323
+ write_csv(os.path.join(output_dir, "data", "attached_wage_data.csv"), attached_wages)
1324
+ write_csv(os.path.join(output_dir, "data", "oee_assumptions.csv"), oee_assumptions)
1325
+ write_csv(os.path.join(output_dir, "data", "plant_unit_sales.csv"), plant_sales)
1326
+ write_csv(os.path.join(output_dir, "data", "aptean_report_data.csv"), aptean_data)
1327
+
1328
+ # Write text documents
1329
+ write_text(os.path.join(output_dir, "documents", "scrap_rate_report.txt"), generate_scrap_report(cfg))
1330
+
1331
+ interviews = generate_interviews()
1332
+ for name, text in interviews.items():
1333
+ write_text(os.path.join(output_dir, "documents", f"interview_{name}.txt"), text)
1334
+
1335
+ write_text(os.path.join(output_dir, "documents", "frito_lay_case_study.txt"), generate_frito_lay_case(cfg))
1336
+ write_text(os.path.join(output_dir, "documents", "aptean_report.txt"), generate_aptean_report_text(aptean_data))
1337
+
1338
+ # Compute ground truth
1339
+ print("Computing ground truth...")
1340
+ truth = compute_ground_truth(
1341
+ employees, equipment, quality_losses, plant_labor,
1342
+ bls_wages, attached_wages, oee_assumptions, plant_sales, aptean_data, cfg
1343
+ )
1344
+
1345
+ # Generate task prompts and rubrics
1346
+ print("Generating tasks...")
1347
+ tasks = generate_task_prompts(truth, cfg)
1348
+
1349
+ # Write tasks and ground truth
1350
+ with open(os.path.join(output_dir, "tasks.json"), "w") as f:
1351
+ json.dump(tasks, f, indent=2, default=str)
1352
+
1353
+ with open(os.path.join(output_dir, "ground_truth.json"), "w") as f:
1354
+ json.dump(truth, f, indent=2, default=str)
1355
+
1356
+ # Print summary
1357
+ print(f"\nWorld generated in {output_dir}/")
1358
+ print(f" Employees: {len(employees)}")
1359
+ print(f" Equipment: {len(equipment)}")
1360
+ print(f" Quality losses: {len(quality_losses)}")
1361
+ print(f" Plant labor: {len(plant_labor)}")
1362
+ print(f" Tasks: {len(tasks)}")
1363
+ print(f"\nGround truth summary:")
1364
+ for task in tasks:
1365
+ n_criteria = len(task["rubric"])
1366
+ print(f" {task['task_id']} ({task['task_name']}): {n_criteria} criteria")
1367
+
1368
+ # Print sample ground truth values for validation
1369
+ print(f"\nSample answers for validation:")
1370
+ print(f" Task 1 - High-priority count: {truth['task1']['high_priority_count']}")
1371
+ print(f" Task 6 - Avg inefficient hours: {truth['task6']['avg_by_plant']}")
1372
+ print(f" Task 14 - Trained count: {truth['task14']['trained_count']}")
1373
+ print(f" Task 13 - Downtime ratios: {truth['task13']}")
1374
+
1375
+ return employees, equipment, truth, tasks
1376
+
1377
+
1378
+ def generate_worlds_batch(
1379
+ n_worlds: int,
1380
+ output_base: str = "./harfeast_worlds",
1381
+ base_seed: int = 0,
1382
+ ) -> list[dict]:
1383
+ """
1384
+ Generate n_worlds distinct worlds for RL scalability.
1385
+ Returns manifest of (world_id, path, task_count) for each world.
1386
+ """
1387
+ os.makedirs(output_base, exist_ok=True)
1388
+ rng = random.Random(base_seed)
1389
+ manifest = []
1390
+
1391
+ for i in range(n_worlds):
1392
+ seed = base_seed + i * 10000 + rng.randint(0, 9999)
1393
+ world_dir = os.path.join(output_base, f"world_{i:04d}")
1394
+ try:
1395
+ generate_world(seed=seed, output_dir=world_dir)
1396
+ manifest.append({
1397
+ "world_id": i,
1398
+ "path": world_dir,
1399
+ "seed": seed,
1400
+ "task_count": 14,
1401
+ })
1402
+ except Exception as e:
1403
+ print(f"Warning: world {i} failed: {e}")
1404
+
1405
+ manifest_path = os.path.join(output_base, "manifest.json")
1406
+ with open(manifest_path, "w") as f:
1407
+ json.dump(manifest, f, indent=2)
1408
+
1409
+ # Build all_tasks.json: flat list for sampling (world_path, task_id, prompt)
1410
+ all_tasks = []
1411
+ for m in manifest:
1412
+ tasks_path = os.path.join(m["path"], "tasks.json")
1413
+ with open(tasks_path) as f:
1414
+ tasks = json.load(f)
1415
+ for t in tasks:
1416
+ all_tasks.append({
1417
+ "world_path": m["path"],
1418
+ "world_id": m["world_id"],
1419
+ "task_id": t["task_id"],
1420
+ "task_name": t["task_name"],
1421
+ "prompt": t["prompt"],
1422
+ })
1423
+ with open(os.path.join(output_base, "all_tasks.json"), "w") as f:
1424
+ json.dump(all_tasks, f, indent=2)
1425
+
1426
+ print(f"\nBatch complete: {len(manifest)} worlds, {len(all_tasks)} task instances")
1427
+ return manifest
1428
+
1429
+
1430
+ if __name__ == "__main__":
1431
+ import sys
1432
+ seed = 42
1433
+ output_dir = "./harfeast_world"
1434
+ batch_n = 0
1435
+
1436
+ args = sys.argv[1:]
1437
+ i = 0
1438
+ while i < len(args):
1439
+ if args[i] == "--seed" and i + 1 < len(args):
1440
+ seed = int(args[i + 1])
1441
+ i += 2
1442
+ elif args[i] == "--output-dir" and i + 1 < len(args):
1443
+ output_dir = args[i + 1]
1444
+ i += 2
1445
+ elif args[i] == "--batch" and i + 1 < len(args):
1446
+ batch_n = int(args[i + 1])
1447
+ i += 2
1448
+ else:
1449
+ i += 1
1450
+
1451
+ if batch_n > 0:
1452
+ generate_worlds_batch(n_worlds=batch_n, output_base=output_dir, base_seed=seed)
1453
+ else:
1454
+ generate_world(seed=seed, output_dir=output_dir)
harfeast_world/data/aptean_report_data.csv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ technology,users_growth,non_users_growth,category
2
+ IoT Sensors,12.4,4.0,Top Investment to Date
3
+ Predictive Maintenance,11.8,3.9,Top Planned 2024
4
+ Cloud ERP,9.2,5.1,Top Investment to Date
5
+ Robotic Automation,10.5,3.6,Top Planned 2024
6
+ AI Quality Control,8.9,4.6,Top Investment to Date
7
+ Digital Twin,7.3,4.0,Other
8
+ Supply Chain AI,6.9,3.1,Other
9
+ Automated Scheduling,8.1,5.7,Top Planned 2024
10
+ Warehouse Robotics,7.8,5.2,Other
11
+ Advanced Analytics,9.6,4.4,Top Investment to Date
harfeast_world/data/attached_wage_data.csv ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ role,avg_hourly_salary
2
+ Production/Manufacturing Operator,20.61
3
+ Quality Control/Quality Assurance,24.73
4
+ Maintenance Technician,28.19
5
+ Production Supervisor/Team Lead,31.83
6
+ Supply Chain/Logistics Coordinator,26.46
7
+ Demand Planning/Forecasting,33.65
8
+ Administrative/Support Staff,22.43
9
+ Plant Management,46.5
harfeast_world/data/bls_wage_benchmark.csv ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ occupation,industry,median_hourly_wage
2
+ All Occupations,Food Manufacturing,18.94
3
+ Production Workers,Food Manufacturing,17.11
4
+ Supervisors,Food Manufacturing,27.32
5
+ Maintenance,Food Manufacturing,23.3
6
+ Quality Control,Food Manufacturing,20.28
7
+ Logistics,Food Manufacturing,21.86
harfeast_world/data/employee_survey.csv ADDED
The diff for this file is too large to render. See raw diff
 
harfeast_world/data/equipment_data.csv ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ equipment_id,plant,product_family,equipment_type,scheduled_hours,actual_hours,standard_hours,labor_hours,scrap_rate,oee,unplanned_downtime_hours,units_produced,cogs_per_ton,failure_cost
2
+ EQ-ROC-000,"Rockford, Illinois",Canned Vegetables,Pasteurizer,1733,1586,1627,944,0.0418,0.8117,297,579796,1311.74,13428.4
3
+ EQ-ROC-001,"Rockford, Illinois",Canned Vegetables,Filler,3678,3197,3621,2871,0.0368,0.7813,398,354781,1562.72,22948.38
4
+ EQ-ROC-002,"Rockford, Illinois",Sauces,Pasteurizer,3912,2747,3329,763,0.0707,0.8684,478,297246,1842.58,42930.29
5
+ EQ-ROC-003,"Rockford, Illinois",Canned Vegetables,Labeler,4152,3998,4059,1824,0.048,0.7672,148,552915,1412.38,40015.28
6
+ EQ-ROC-004,"Rockford, Illinois",Canned Vegetables,Sealer,4696,3698,4498,1261,0.0733,0.8928,76,475144,1627.94,25348.82
7
+ EQ-ROC-005,"Rockford, Illinois",Canned Vegetables,Conveyor,1992,1934,1964,2609,0.0337,0.8761,330,444346,1989.52,37354.69
8
+ EQ-ROC-006,"Rockford, Illinois",Canned Vegetables,Labeler,4543,4367,3999,2068,0.0698,0.8549,125,299459,1599.28,5567.85
9
+ EQ-ROC-007,"Rockford, Illinois",Condiments,Labeler,3601,3237,3157,2245,0.0682,0.8438,112,485890,1704.96,13410.6
10
+ EQ-ROC-008,"Rockford, Illinois",Condiments,Conveyor,2061,1528,2020,2710,0.0362,0.8177,304,480395,1148.16,28810.95
11
+ EQ-ROC-009,"Rockford, Illinois",Sauces,Filler,2086,1686,1986,754,0.083,0.8102,230,365046,932.98,41144.21
12
+ EQ-ROC-010,"Rockford, Illinois",Condiments,Labeler,4403,3744,4229,2883,0.0305,0.7457,278,558580,1107.91,12617.83
13
+ EQ-ROC-011,"Rockford, Illinois",Condiments,Filler,4229,3162,3621,1933,0.0359,0.8175,359,588148,1931.38,14392.21
14
+ EQ-ROC-012,"Rockford, Illinois",Canned Vegetables,Sealer,4809,3382,4659,694,0.0665,0.7725,204,334544,1186.82,19387.8
15
+ EQ-ROC-013,"Rockford, Illinois",Sauces,Sealer,3256,2429,2910,2823,0.0613,0.7478,155,360347,1405.05,24627.38
16
+ EQ-ROC-014,"Rockford, Illinois",Sauces,Conveyor,1673,1174,1586,1068,0.0317,0.7232,182,121148,1207.69,27947.16
17
+ EQ-ROC-015,"Rockford, Illinois",Condiments,Sealer,4013,2818,3711,909,0.0417,0.7836,339,332097,1090.85,37628.79
18
+ EQ-ROC-016,"Rockford, Illinois",Sauces,Filler,3476,2548,3314,2769,0.0391,0.7519,121,533280,1031.93,6245.98
19
+ EQ-ROC-017,"Rockford, Illinois",Canned Vegetables,Labeler,2842,2014,2793,1991,0.0555,0.7886,130,121735,1281.48,7889.72
20
+ EQ-ROC-018,"Rockford, Illinois",Sauces,Conveyor,3733,3483,3652,1687,0.0354,0.7848,227,383518,1221.97,41539.85
21
+ EQ-ROC-019,"Rockford, Illinois",Condiments,Boiler,3025,2550,2755,988,0.076,0.7787,124,511790,1751.8,42325.65
22
+ EQ-ROC-020,"Rockford, Illinois",Condiments,Pasteurizer,3811,2695,3602,1858,0.0356,0.7725,285,402802,874.93,10593.37
23
+ EQ-ROC-021,"Rockford, Illinois",Condiments,Labeler,4681,4192,4330,2238,0.0625,0.9395,237,137343,1501.37,20224.76
24
+ EQ-ROC-022,"Rockford, Illinois",Condiments,Boiler,2897,2051,2673,1921,0.0853,0.7199,84,571595,1379.81,47643.34
25
+ EQ-ROC-023,"Rockford, Illinois",Canned Vegetables,Filler,4595,4073,4288,1129,0.099,0.7846,481,260082,1342.54,44328.9
26
+ EQ-ROC-024,"Rockford, Illinois",Condiments,Conveyor,3800,3003,3338,872,0.0732,0.7203,360,378981,1121.33,30381.28
27
+ EQ-ROC-025,"Rockford, Illinois",Condiments,Labeler,2325,2089,2150,2946,0.097,0.8323,329,559489,1939.53,14673.21
28
+ EQ-ROC-026,"Rockford, Illinois",Sauces,Sealer,3952,3753,3731,909,0.0931,0.877,159,363732,1281.06,24065.96
29
+ EQ-ROC-027,"Rockford, Illinois",Sauces,Boiler,4300,3381,3903,1834,0.0487,0.7893,355,230265,1502.64,15463.34
30
+ EQ-ROC-028,"Rockford, Illinois",Sauces,Labeler,2875,2558,2500,1099,0.0354,0.762,254,187043,1746.84,11998.44
31
+ EQ-ROC-029,"Rockford, Illinois",Canned Vegetables,Sealer,4907,4625,4782,2642,0.0926,0.7067,473,430704,989.65,43344.48
32
+ EQ-ROC-030,"Rockford, Illinois",Condiments,Conveyor,2181,1692,1873,1155,0.0914,0.8193,138,562547,1626.84,20451.59
33
+ EQ-ROC-031,"Rockford, Illinois",Condiments,Filler,2904,2683,2698,1549,0.0921,0.8017,138,340842,838.59,17417.21
34
+ EQ-ROC-032,"Rockford, Illinois",Condiments,Sealer,4297,3149,4238,2131,0.0515,0.7621,116,140630,926.3,20279.03
35
+ EQ-ROC-033,"Rockford, Illinois",Sauces,Sealer,2526,2170,2328,1222,0.0836,0.8179,86,390567,1437.77,18467.72
36
+ EQ-ROC-034,"Rockford, Illinois",Sauces,Labeler,2551,1955,2210,1418,0.0413,0.8961,234,494490,1389.34,32470.28
37
+ EQ-ROC-035,"Rockford, Illinois",Condiments,Pasteurizer,2856,2142,2683,1753,0.061,0.8758,164,481912,1045.16,14648.35
38
+ EQ-ROC-036,"Rockford, Illinois",Sauces,Boiler,3349,2809,3231,1925,0.0383,0.7689,259,524422,943.08,21586.28
39
+ EQ-ROC-037,"Rockford, Illinois",Sauces,Mixer,4289,3267,3824,2709,0.0424,0.7725,359,223002,858.46,48103.73
40
+ EQ-ROC-038,"Rockford, Illinois",Sauces,Filler,2410,2186,2409,564,0.0652,0.8075,185,598148,1086.33,26528.04
41
+ EQ-ROC-039,"Rockford, Illinois",Sauces,Pasteurizer,4391,3546,3843,1014,0.0333,0.8158,418,137230,1905.1,33435.7
42
+ EQ-ROC-040,"Rockford, Illinois",Canned Vegetables,Conveyor,2973,2700,2701,2313,0.0564,0.8243,159,333766,1895.76,32321.12
43
+ EQ-ROC-041,"Rockford, Illinois",Canned Vegetables,Labeler,4504,3233,4445,1994,0.079,0.7054,67,277296,810.75,8488.7
44
+ EQ-ROC-042,"Rockford, Illinois",Condiments,Conveyor,1792,1546,1551,2751,0.0523,0.887,87,123121,801.98,16444.37
45
+ EQ-ROC-043,"Rockford, Illinois",Canned Vegetables,Filler,4686,3911,4375,2528,0.0679,0.9096,405,550914,1413.72,15207.6
46
+ EQ-ROC-044,"Rockford, Illinois",Canned Vegetables,Mixer,4739,3397,4306,700,0.0823,0.7314,217,545889,1354.51,22776.33
47
+ EQ-ROC-045,"Rockford, Illinois",Condiments,Boiler,3616,3176,3597,1763,0.0483,0.735,65,132324,1576.8,17866.29
48
+ EQ-ROC-046,"Rockford, Illinois",Condiments,Labeler,2444,1887,2082,1689,0.089,0.8141,290,321846,881.28,49089.0
49
+ EQ-ROC-047,"Rockford, Illinois",Sauces,Sealer,2094,1879,2045,2350,0.0796,0.8508,320,339577,1970.34,34686.41
50
+ EQ-ROC-048,"Rockford, Illinois",Sauces,Conveyor,2247,2047,2075,2412,0.0653,0.7566,191,350536,1489.74,34703.48
51
+ EQ-ROC-049,"Rockford, Illinois",Sauces,Pasteurizer,3371,3145,3063,1586,0.0628,0.7607,374,152274,848.43,31398.35
52
+ EQ-ROC-050,"Rockford, Illinois",Sauces,Sealer,4054,3446,3638,2430,0.0872,0.7712,156,523880,1320.38,6143.75
53
+ EQ-ROC-051,"Rockford, Illinois",Canned Vegetables,Boiler,2744,2345,2501,2082,0.0568,0.7888,373,548053,1540.88,14516.56
54
+ EQ-ROC-052,"Rockford, Illinois",Condiments,Pasteurizer,3138,3003,2675,1715,0.0892,0.7761,90,252548,955.44,12433.76
55
+ EQ-ROC-053,"Rockford, Illinois",Canned Vegetables,Conveyor,2588,2287,2555,1816,0.0914,0.7442,109,245449,1214.28,18202.75
56
+ EQ-ROC-054,"Rockford, Illinois",Sauces,Sealer,4939,4296,4372,1513,0.0468,0.7952,160,505213,1343.79,37845.33
57
+ EQ-MAD-055,"Madison, Wisconsin",Condiments,Mixer,4380,3952,3997,2205,0.0554,0.8175,398,340515,1875.26,26862.83
58
+ EQ-MAD-056,"Madison, Wisconsin",Condiments,Boiler,4343,4012,3921,2801,0.0408,0.8097,199,528746,1693.36,42325.24
59
+ EQ-MAD-057,"Madison, Wisconsin",Condiments,Conveyor,4276,4006,3668,2149,0.0864,0.794,398,546745,989.36,26613.26
60
+ EQ-MAD-058,"Madison, Wisconsin",Sauces,Filler,3674,3151,3542,2980,0.0529,0.8324,229,233586,1614.37,31316.03
61
+ EQ-MAD-059,"Madison, Wisconsin",Sauces,Boiler,1685,1208,1489,1793,0.0488,0.8669,247,130854,1472.32,39182.37
62
+ EQ-MAD-060,"Madison, Wisconsin",Condiments,Boiler,4001,3855,3571,1537,0.0818,0.7254,427,384875,1208.85,36822.47
63
+ EQ-MAD-061,"Madison, Wisconsin",Canned Vegetables,Pasteurizer,4796,4527,4346,1119,0.0413,0.7871,178,136869,1174.75,37535.73
64
+ EQ-MAD-062,"Madison, Wisconsin",Sauces,Filler,1576,1129,1423,556,0.0853,0.8123,84,376409,1518.58,28650.97
65
+ EQ-MAD-063,"Madison, Wisconsin",Canned Vegetables,Conveyor,2004,1504,1818,1373,0.0722,0.8969,424,445511,1062.21,27841.49
66
+ EQ-MAD-064,"Madison, Wisconsin",Condiments,Mixer,3992,3480,3681,2511,0.0864,0.8201,493,486916,1764.05,15413.32
67
+ EQ-MAD-065,"Madison, Wisconsin",Sauces,Labeler,2808,2187,2610,1253,0.0594,0.7906,412,325383,865.07,13133.05
68
+ EQ-MAD-066,"Madison, Wisconsin",Canned Vegetables,Pasteurizer,3617,2734,3195,2776,0.0678,0.7664,80,404271,1088.35,23697.69
69
+ EQ-MAD-067,"Madison, Wisconsin",Sauces,Pasteurizer,4761,3997,4374,1659,0.0468,0.6867,237,501208,1944.96,21986.38
70
+ EQ-MAD-068,"Madison, Wisconsin",Condiments,Sealer,2546,2222,2186,1125,0.0974,0.8184,437,231134,1042.45,39718.31
71
+ EQ-MAD-069,"Madison, Wisconsin",Sauces,Mixer,1878,1341,1704,1011,0.0521,0.8116,142,531271,859.18,32822.1
72
+ EQ-MAD-070,"Madison, Wisconsin",Canned Vegetables,Filler,3919,2950,3887,706,0.0556,0.7764,488,372641,1177.88,38602.04
73
+ EQ-MAD-071,"Madison, Wisconsin",Canned Vegetables,Conveyor,1580,1273,1347,847,0.0857,0.7543,56,330695,1137.73,9775.56
74
+ EQ-MAD-072,"Madison, Wisconsin",Sauces,Mixer,4112,3220,3998,1694,0.041,0.8957,239,249762,1474.98,16010.75
75
+ EQ-MAD-073,"Madison, Wisconsin",Condiments,Pasteurizer,1975,1768,1926,1947,0.0474,0.7237,128,247782,864.42,30505.35
76
+ EQ-MAD-074,"Madison, Wisconsin",Condiments,Filler,2707,2400,2365,2351,0.0842,0.7234,254,502868,933.99,21840.78
77
+ EQ-MAD-075,"Madison, Wisconsin",Canned Vegetables,Boiler,2426,2368,2266,2735,0.0714,0.841,451,476609,1507.36,48238.8
78
+ EQ-MAD-076,"Madison, Wisconsin",Sauces,Sealer,4831,3901,4632,2062,0.0347,0.7559,191,528080,1272.3,49989.79
79
+ EQ-MAD-077,"Madison, Wisconsin",Condiments,Conveyor,2380,1880,2108,1960,0.0321,0.7006,83,294196,1003.85,44286.01
80
+ EQ-MAD-078,"Madison, Wisconsin",Sauces,Boiler,2815,2395,2532,1734,0.0307,0.8496,393,414109,1624.99,39402.44
81
+ EQ-MAD-079,"Madison, Wisconsin",Canned Vegetables,Pasteurizer,2554,2111,2188,2833,0.0775,0.8107,153,356949,971.9,33875.59
82
+ EQ-MAD-080,"Madison, Wisconsin",Condiments,Mixer,4100,3912,3914,643,0.0784,0.7435,455,117014,1794.39,27287.92
83
+ EQ-MAD-081,"Madison, Wisconsin",Condiments,Pasteurizer,4477,3188,3972,1534,0.0455,0.7195,184,491395,1384.79,32813.73
84
+ EQ-MAD-082,"Madison, Wisconsin",Sauces,Pasteurizer,1807,1713,1631,1309,0.0424,0.7283,373,473099,1905.91,44328.81
85
+ EQ-MAD-083,"Madison, Wisconsin",Canned Vegetables,Labeler,3465,2813,3302,1755,0.0944,0.8487,482,456544,862.68,45827.13
86
+ EQ-MAD-084,"Madison, Wisconsin",Condiments,Boiler,4774,3436,4315,1670,0.0785,0.736,237,550680,1305.64,30185.51
87
+ EQ-MAD-085,"Madison, Wisconsin",Sauces,Sealer,4931,4233,4502,2366,0.0572,0.8238,233,107419,1209.96,44717.37
88
+ EQ-MAD-086,"Madison, Wisconsin",Condiments,Mixer,3935,2883,3880,1048,0.0711,0.7438,109,575161,1245.74,8358.79
89
+ EQ-MAD-087,"Madison, Wisconsin",Canned Vegetables,Sealer,2882,2343,2706,704,0.0402,0.7968,369,573715,1481.01,43054.91
90
+ EQ-MAD-088,"Madison, Wisconsin",Condiments,Boiler,4438,3976,4030,1715,0.0513,0.6774,278,417989,1988.48,38278.66
91
+ EQ-MAD-089,"Madison, Wisconsin",Canned Vegetables,Boiler,2987,2279,2695,1737,0.0336,0.7886,369,303780,1484.84,42385.02
92
+ EQ-MAD-090,"Madison, Wisconsin",Canned Vegetables,Filler,4167,3866,3999,1555,0.0393,0.8075,267,312528,1503.69,37332.05
93
+ EQ-MAD-091,"Madison, Wisconsin",Canned Vegetables,Mixer,3208,2429,2779,2015,0.0699,0.778,276,322952,1092.46,49828.79
94
+ EQ-MAD-092,"Madison, Wisconsin",Sauces,Mixer,4429,3550,3994,2531,0.0303,0.7859,55,198861,1217.96,26170.25
95
+ EQ-MAD-093,"Madison, Wisconsin",Sauces,Pasteurizer,2735,2263,2485,2329,0.0416,0.7411,324,262208,1760.47,5084.02
96
+ EQ-MAD-094,"Madison, Wisconsin",Condiments,Mixer,4433,3984,3912,2361,0.0628,0.8767,189,374601,1008.94,25002.02
97
+ EQ-MAD-095,"Madison, Wisconsin",Sauces,Conveyor,2225,2052,1934,2312,0.0851,0.7887,428,142395,939.12,28168.34
98
+ EQ-MAD-096,"Madison, Wisconsin",Canned Vegetables,Mixer,4879,4556,4542,2217,0.06,0.8192,312,365067,1599.44,23588.55
99
+ EQ-MAD-097,"Madison, Wisconsin",Canned Vegetables,Pasteurizer,1735,1377,1714,2919,0.0912,0.6936,370,149223,1587.03,12361.06
100
+ EQ-MAD-098,"Madison, Wisconsin",Canned Vegetables,Conveyor,3195,2776,3104,1023,0.0574,0.7447,343,592050,1540.01,26607.26
101
+ EQ-MAD-099,"Madison, Wisconsin",Sauces,Mixer,3775,2810,3748,1316,0.0661,0.8057,201,200922,1041.26,35774.08
102
+ EQ-MAD-100,"Madison, Wisconsin",Sauces,Mixer,4010,3557,3770,2970,0.0322,0.786,205,460969,1037.52,38401.97
103
+ EQ-MAD-101,"Madison, Wisconsin",Sauces,Filler,2044,1574,1843,2614,0.0614,0.8332,360,317344,1341.95,9416.46
104
+ EQ-MAD-102,"Madison, Wisconsin",Sauces,Sealer,1666,1542,1612,1124,0.0558,0.7007,255,135158,950.96,43615.56
105
+ EQ-MAD-103,"Madison, Wisconsin",Condiments,Sealer,2031,1905,1916,2489,0.0723,0.7765,275,183872,1759.19,31972.7
106
+ EQ-MAD-104,"Madison, Wisconsin",Sauces,Filler,3277,3119,3232,2767,0.0352,0.835,249,316874,1121.9,9016.41
107
+ EQ-MAD-105,"Madison, Wisconsin",Condiments,Pasteurizer,1797,1689,1646,2598,0.0396,0.8321,456,187306,1950.57,36641.22
108
+ EQ-MAD-106,"Madison, Wisconsin",Condiments,Mixer,4439,3922,4210,2271,0.0694,0.8268,481,395029,1165.91,43513.33
109
+ EQ-MAD-107,"Madison, Wisconsin",Sauces,Labeler,2538,2253,2441,1140,0.0901,0.6615,256,537075,1770.55,23417.57
110
+ EQ-MAD-108,"Madison, Wisconsin",Condiments,Mixer,4026,3086,3750,2611,0.054,0.7215,480,580054,840.75,45265.68
111
+ EQ-DAV-109,"Davenport, Iowa",Sauces,Sealer,3580,3228,3443,1789,0.0399,0.8175,144,465410,1581.41,38356.13
112
+ EQ-DAV-110,"Davenport, Iowa",Condiments,Labeler,3569,2686,3466,2809,0.0705,0.7222,257,317615,1259.43,30286.43
113
+ EQ-DAV-111,"Davenport, Iowa",Condiments,Pasteurizer,2304,1635,2142,1837,0.0997,0.7636,247,100375,1602.73,6308.05
114
+ EQ-DAV-112,"Davenport, Iowa",Canned Vegetables,Pasteurizer,3992,3410,3722,2859,0.0411,0.7338,242,546917,1372.08,49032.57
115
+ EQ-DAV-113,"Davenport, Iowa",Condiments,Filler,4805,3632,4404,1238,0.0475,0.651,151,569666,1250.4,23923.74
116
+ EQ-DAV-114,"Davenport, Iowa",Canned Vegetables,Conveyor,3742,3226,3572,1369,0.0762,0.6798,428,543816,1032.54,23361.73
117
+ EQ-DAV-115,"Davenport, Iowa",Condiments,Mixer,2612,2114,2397,2003,0.0353,0.8261,163,412995,806.77,47582.29
118
+ EQ-DAV-116,"Davenport, Iowa",Condiments,Sealer,4957,4711,4700,1738,0.0977,0.7259,209,363514,1314.51,10603.04
119
+ EQ-DAV-117,"Davenport, Iowa",Canned Vegetables,Sealer,2243,1629,2232,2502,0.0596,0.7108,447,166993,1179.48,49299.74
120
+ EQ-DAV-118,"Davenport, Iowa",Sauces,Mixer,3902,3347,3619,1265,0.0895,0.7406,424,418974,1744.95,39585.49
121
+ EQ-DAV-119,"Davenport, Iowa",Canned Vegetables,Mixer,1590,1521,1512,2915,0.0948,0.6714,253,372133,1407.76,19410.17
122
+ EQ-DAV-120,"Davenport, Iowa",Condiments,Conveyor,4712,4234,4451,2321,0.0498,0.7859,154,400803,1563.47,6459.48
123
+ EQ-DAV-121,"Davenport, Iowa",Sauces,Sealer,1866,1362,1735,2866,0.0565,0.6465,80,214837,972.14,14050.75
124
+ EQ-DAV-122,"Davenport, Iowa",Canned Vegetables,Sealer,2528,2430,2354,1133,0.0855,0.7681,97,573427,1499.51,6883.23
125
+ EQ-DAV-123,"Davenport, Iowa",Condiments,Boiler,2551,2320,2389,915,0.0704,0.8051,78,195127,1176.22,25181.63
126
+ EQ-DAV-124,"Davenport, Iowa",Sauces,Pasteurizer,4372,3105,4164,2401,0.0398,0.6415,356,345503,1267.03,24359.27
127
+ EQ-DAV-125,"Davenport, Iowa",Sauces,Conveyor,4410,4126,4095,2845,0.0927,0.6962,427,568841,1042.42,47704.43
128
+ EQ-DAV-126,"Davenport, Iowa",Condiments,Mixer,3767,3232,3664,1676,0.0453,0.6815,365,201934,825.62,9496.39
129
+ EQ-DAV-127,"Davenport, Iowa",Canned Vegetables,Conveyor,2336,2087,2185,1011,0.0736,0.7417,358,334732,1646.77,13116.22
130
+ EQ-DAV-128,"Davenport, Iowa",Condiments,Filler,3284,3106,3024,1371,0.0516,0.7764,150,532486,1840.55,8495.56
131
+ EQ-DAV-129,"Davenport, Iowa",Condiments,Boiler,2952,2348,2876,1515,0.0617,0.7553,82,171615,1985.05,40200.58
132
+ EQ-DAV-130,"Davenport, Iowa",Sauces,Boiler,3434,2533,2936,1069,0.0945,0.5621,250,521436,1325.42,6507.34
133
+ EQ-DAV-131,"Davenport, Iowa",Sauces,Boiler,4133,3108,3934,532,0.0592,0.697,220,356273,1435.84,9984.39
134
+ EQ-DAV-132,"Davenport, Iowa",Sauces,Filler,2432,2109,2309,937,0.0527,0.73,135,312271,1138.16,42868.74
135
+ EQ-DAV-133,"Davenport, Iowa",Condiments,Sealer,3943,2865,3792,1291,0.0617,0.6626,394,286355,1332.08,18520.18
136
+ EQ-DAV-134,"Davenport, Iowa",Condiments,Pasteurizer,4338,4037,4033,2958,0.0526,0.7135,477,400650,1960.57,36823.83
137
+ EQ-DAV-135,"Davenport, Iowa",Sauces,Sealer,4819,4401,4171,781,0.0815,0.6437,264,230891,1744.82,29455.07
138
+ EQ-DAV-136,"Davenport, Iowa",Condiments,Mixer,2619,2147,2515,1736,0.0326,0.67,388,543768,1987.89,5416.04
139
+ EQ-DAV-137,"Davenport, Iowa",Sauces,Boiler,2208,2000,2206,2649,0.0456,0.7947,245,468337,872.64,39173.1
140
+ EQ-DAV-138,"Davenport, Iowa",Condiments,Boiler,4836,3455,4779,1248,0.0822,0.7072,158,208301,1732.91,31013.77
141
+ EQ-DAV-139,"Davenport, Iowa",Canned Vegetables,Sealer,3683,3492,3207,2423,0.0616,0.7632,203,266090,1836.29,44320.54
142
+ EQ-DAV-140,"Davenport, Iowa",Sauces,Conveyor,4486,4319,4242,1506,0.0675,0.8153,412,241213,1994.44,38254.35
143
+ EQ-DAV-141,"Davenport, Iowa",Canned Vegetables,Conveyor,1810,1323,1681,2179,0.0841,0.7338,372,460720,1413.5,33437.18
144
+ EQ-DAV-142,"Davenport, Iowa",Sauces,Conveyor,4855,3740,4431,2514,0.0967,0.8146,61,458890,1467.81,34315.36
145
+ EQ-DAV-143,"Davenport, Iowa",Condiments,Conveyor,3461,2853,3394,964,0.0669,0.7284,307,554984,1027.05,10528.37
146
+ EQ-DAV-144,"Davenport, Iowa",Sauces,Filler,2126,1829,1897,2430,0.0799,0.7989,150,367611,1704.4,5043.94
147
+ EQ-DAV-145,"Davenport, Iowa",Condiments,Filler,2720,2426,2414,2331,0.0938,0.867,212,279162,1241.67,7899.09
148
+ EQ-DAV-146,"Davenport, Iowa",Sauces,Boiler,2075,1766,1929,2806,0.0467,0.7293,451,282009,1282.8,6259.55
149
+ EQ-DAV-147,"Davenport, Iowa",Canned Vegetables,Boiler,2027,1688,1822,1257,0.0729,0.7348,189,334239,1495.49,38975.4
150
+ EQ-DAV-148,"Davenport, Iowa",Condiments,Sealer,3051,2500,2956,1961,0.0606,0.755,401,516156,1240.02,28602.91
151
+ EQ-DAV-149,"Davenport, Iowa",Canned Vegetables,Labeler,3390,3234,3041,1230,0.0738,0.76,184,302690,860.65,35603.75
152
+ EQ-DAV-150,"Davenport, Iowa",Sauces,Filler,2238,1928,2188,1861,0.0424,0.6586,180,344753,1919.49,34897.08
153
+ EQ-DAV-151,"Davenport, Iowa",Condiments,Filler,1773,1342,1511,1885,0.0571,0.6794,96,182678,1693.06,48535.13
154
+ EQ-DAV-152,"Davenport, Iowa",Canned Vegetables,Boiler,4193,3212,3695,2726,0.0518,0.8365,178,564664,1600.99,17657.95
155
+ EQ-DAV-153,"Davenport, Iowa",Canned Vegetables,Pasteurizer,1816,1682,1609,2633,0.0795,0.7637,436,456417,841.48,47425.33
156
+ EQ-DAV-154,"Davenport, Iowa",Canned Vegetables,Pasteurizer,4828,3512,4402,2422,0.0602,0.8887,286,290229,1804.5,18771.22
157
+ EQ-COL-155,"Columbus, Ohio",Canned Vegetables,Boiler,3725,3081,3677,2634,0.0565,0.7177,123,401432,1222.39,39371.34
158
+ EQ-COL-156,"Columbus, Ohio",Canned Vegetables,Pasteurizer,1506,1186,1377,591,0.0812,0.7481,383,514387,872.16,43983.73
159
+ EQ-COL-157,"Columbus, Ohio",Canned Vegetables,Sealer,2966,2091,2567,1155,0.0762,0.6346,140,534797,1561.04,9767.85
160
+ EQ-COL-158,"Columbus, Ohio",Sauces,Conveyor,4619,3407,4143,681,0.0317,0.6796,295,267483,1984.1,44313.82
161
+ EQ-COL-159,"Columbus, Ohio",Canned Vegetables,Labeler,4397,4197,4021,640,0.0424,0.7779,357,180151,1876.29,39838.41
162
+ EQ-COL-160,"Columbus, Ohio",Sauces,Filler,2713,2463,2651,535,0.0353,0.8574,359,496335,1005.84,28211.44
163
+ EQ-COL-161,"Columbus, Ohio",Canned Vegetables,Filler,2354,2082,2351,2097,0.0824,0.7894,428,327394,1333.41,16366.17
164
+ EQ-COL-162,"Columbus, Ohio",Condiments,Filler,4755,4203,4595,1749,0.0356,0.6356,211,485636,1772.45,12681.71
165
+ EQ-COL-163,"Columbus, Ohio",Sauces,Pasteurizer,2080,1895,1949,892,0.0874,0.7613,131,452045,1880.56,8250.12
166
+ EQ-COL-164,"Columbus, Ohio",Sauces,Filler,1666,1440,1666,1386,0.0377,0.6773,52,527476,989.51,32327.48
167
+ EQ-COL-165,"Columbus, Ohio",Canned Vegetables,Sealer,4877,4576,4701,1693,0.046,0.7549,371,246044,1699.8,32006.43
168
+ EQ-COL-166,"Columbus, Ohio",Canned Vegetables,Boiler,2534,1812,2465,2220,0.0418,0.8296,269,414758,1276.92,11838.18
169
+ EQ-COL-167,"Columbus, Ohio",Canned Vegetables,Boiler,2306,2234,2062,2397,0.0952,0.7616,488,456080,1807.74,26682.41
170
+ EQ-COL-168,"Columbus, Ohio",Condiments,Sealer,2399,2080,2252,886,0.0812,0.7377,59,217841,1982.41,28523.16
171
+ EQ-COL-169,"Columbus, Ohio",Sauces,Sealer,4374,3414,3988,1114,0.039,0.635,342,370187,1183.19,5915.92
172
+ EQ-COL-170,"Columbus, Ohio",Canned Vegetables,Labeler,4405,4240,4082,1739,0.0374,0.6346,103,165670,1893.57,25998.1
173
+ EQ-COL-171,"Columbus, Ohio",Canned Vegetables,Mixer,1708,1220,1517,1143,0.0701,0.6752,154,156214,1630.18,26224.95
174
+ EQ-COL-172,"Columbus, Ohio",Condiments,Sealer,4489,3155,3951,1141,0.0576,0.7026,175,597063,1435.95,19829.93
175
+ EQ-COL-173,"Columbus, Ohio",Condiments,Labeler,2228,1743,1987,1446,0.0564,0.6688,52,170831,1062.95,22030.05
176
+ EQ-COL-174,"Columbus, Ohio",Canned Vegetables,Labeler,2523,2389,2322,759,0.0305,0.7357,246,411538,1052.85,33773.35
177
+ EQ-COL-175,"Columbus, Ohio",Condiments,Labeler,4502,4148,4285,1964,0.0497,0.7351,188,404540,1096.48,21425.7
178
+ EQ-COL-176,"Columbus, Ohio",Condiments,Mixer,1979,1722,1892,1708,0.0609,0.8234,238,251870,1936.58,47089.46
179
+ EQ-COL-177,"Columbus, Ohio",Canned Vegetables,Pasteurizer,4092,3354,3724,1808,0.0622,0.8322,117,537375,1577.03,8017.52
180
+ EQ-COL-178,"Columbus, Ohio",Condiments,Filler,3206,3040,2800,2575,0.0963,0.7434,55,427217,1715.45,21105.33
181
+ EQ-COL-179,"Columbus, Ohio",Sauces,Sealer,1975,1769,1930,604,0.0757,0.8151,492,198674,1017.01,41260.74
182
+ EQ-COL-180,"Columbus, Ohio",Condiments,Filler,2516,2371,2174,1135,0.0867,0.6376,255,272513,1126.8,27024.99
183
+ EQ-COL-181,"Columbus, Ohio",Sauces,Labeler,3790,3562,3243,2402,0.0967,0.7397,68,318633,1350.59,8053.35
184
+ EQ-COL-182,"Columbus, Ohio",Condiments,Sealer,4380,3544,4288,1937,0.0962,0.7393,401,116420,1595.99,39799.17
185
+ EQ-COL-183,"Columbus, Ohio",Condiments,Mixer,2701,2046,2659,1854,0.0894,0.7919,311,583662,861.41,43935.7
186
+ EQ-COL-184,"Columbus, Ohio",Canned Vegetables,Sealer,4336,3892,4009,2709,0.0601,0.6521,64,548521,1656.09,12269.62
187
+ EQ-COL-185,"Columbus, Ohio",Canned Vegetables,Pasteurizer,3260,3006,2994,1635,0.0348,0.6363,374,599587,1907.84,31468.71
188
+ EQ-COL-186,"Columbus, Ohio",Condiments,Sealer,3272,2456,2983,1271,0.0349,0.8061,199,391468,1900.64,23881.07
189
+ EQ-COL-187,"Columbus, Ohio",Canned Vegetables,Filler,2848,2639,2717,1873,0.0347,0.748,486,540934,1724.24,42802.98
190
+ EQ-COL-188,"Columbus, Ohio",Canned Vegetables,Labeler,3217,2758,3044,1835,0.0557,0.7054,480,405397,876.78,44563.36
191
+ EQ-COL-189,"Columbus, Ohio",Canned Vegetables,Pasteurizer,3354,2442,3201,2193,0.0692,0.6958,201,292890,1031.96,37096.8
192
+ EQ-COL-190,"Columbus, Ohio",Canned Vegetables,Pasteurizer,4282,3255,4217,1008,0.0671,0.6917,272,358500,1785.9,24584.32
193
+ EQ-COL-191,"Columbus, Ohio",Sauces,Labeler,2394,2043,2366,739,0.0825,0.7779,51,130990,1695.62,17227.79
194
+ EQ-COL-192,"Columbus, Ohio",Condiments,Sealer,2936,2178,2837,2987,0.0336,0.5931,68,503036,1990.22,26882.03
195
+ EQ-COL-193,"Columbus, Ohio",Sauces,Conveyor,3334,2484,3200,1758,0.0929,0.8068,387,260172,984.64,46576.9
196
+ EQ-COL-194,"Columbus, Ohio",Sauces,Sealer,2405,2080,2219,2557,0.0719,0.7792,409,436633,1903.64,26619.83
197
+ EQ-COL-195,"Columbus, Ohio",Canned Vegetables,Labeler,2939,2577,2885,2979,0.0612,0.6977,473,366720,1881.6,25764.88
198
+ EQ-COL-196,"Columbus, Ohio",Sauces,Pasteurizer,4924,4366,4722,2887,0.0639,0.7523,365,198311,1968.7,38518.52
199
+ EQ-COL-197,"Columbus, Ohio",Canned Vegetables,Sealer,3766,3258,3651,2233,0.0647,0.7416,104,269771,1034.05,23058.6
200
+ EQ-COL-198,"Columbus, Ohio",Sauces,Mixer,3164,3037,3022,1143,0.0974,0.765,407,124505,1152.22,14044.02
201
+ EQ-COL-199,"Columbus, Ohio",Canned Vegetables,Conveyor,3887,3642,3575,869,0.0843,0.7819,50,157933,1988.74,44660.59
202
+ EQ-COL-200,"Columbus, Ohio",Sauces,Pasteurizer,4072,3385,3729,1032,0.097,0.8548,392,193558,1999.5,25633.36
203
+ EQ-COL-201,"Columbus, Ohio",Sauces,Pasteurizer,3626,2562,3242,1779,0.074,0.7401,269,477114,1288.64,22893.77
204
+ EQ-COL-202,"Columbus, Ohio",Canned Vegetables,Labeler,4593,3260,4413,1280,0.0966,0.5958,154,134993,1182.28,18904.13
205
+ EQ-COL-203,"Columbus, Ohio",Condiments,Sealer,4600,3382,4199,2565,0.0569,0.8377,260,567313,1414.85,49148.72
206
+ EQ-LAN-204,"Lansing, Michigan",Sauces,Filler,3025,2136,2886,769,0.0797,0.701,416,305977,1892.34,15690.12
207
+ EQ-LAN-205,"Lansing, Michigan",Canned Vegetables,Pasteurizer,3683,3600,3256,2663,0.0408,0.6907,397,129057,1778.64,37049.59
208
+ EQ-LAN-206,"Lansing, Michigan",Condiments,Conveyor,3024,2938,2700,511,0.0464,0.6735,332,441798,936.7,43809.58
209
+ EQ-LAN-207,"Lansing, Michigan",Sauces,Sealer,2274,1734,1990,2916,0.0822,0.6692,461,344507,1460.18,5541.2
210
+ EQ-LAN-208,"Lansing, Michigan",Sauces,Mixer,2339,2271,1999,1332,0.0457,0.7492,413,549449,1469.51,47091.78
211
+ EQ-LAN-209,"Lansing, Michigan",Sauces,Boiler,2847,2277,2775,833,0.0377,0.7444,104,330931,813.36,38078.29
212
+ EQ-LAN-210,"Lansing, Michigan",Sauces,Sealer,4936,3467,4909,891,0.0962,0.7644,272,518299,1120.32,9167.44
213
+ EQ-LAN-211,"Lansing, Michigan",Canned Vegetables,Sealer,4749,4104,4633,2918,0.0682,0.7191,245,190877,1919.33,19308.37
214
+ EQ-LAN-212,"Lansing, Michigan",Condiments,Boiler,2027,1645,1818,562,0.0568,0.7397,78,145818,1379.08,39261.4
215
+ EQ-LAN-213,"Lansing, Michigan",Sauces,Labeler,4144,3707,3748,2665,0.0441,0.7874,445,231315,1505.55,14804.58
216
+ EQ-LAN-214,"Lansing, Michigan",Canned Vegetables,Conveyor,4309,3339,4224,2518,0.0534,0.6514,331,358218,1307.73,12177.66
217
+ EQ-LAN-215,"Lansing, Michigan",Canned Vegetables,Sealer,2959,2145,2638,1442,0.0524,0.6345,472,521806,1899.13,39650.96
218
+ EQ-LAN-216,"Lansing, Michigan",Canned Vegetables,Filler,4343,3195,4131,1154,0.0915,0.6117,280,101264,928.65,28761.15
219
+ EQ-LAN-217,"Lansing, Michigan",Canned Vegetables,Filler,3540,2917,3336,1838,0.033,0.7362,428,415144,1855.67,34623.32
220
+ EQ-LAN-218,"Lansing, Michigan",Canned Vegetables,Conveyor,4188,4044,3908,2411,0.0905,0.6917,104,171177,1873.09,28189.69
221
+ EQ-LAN-219,"Lansing, Michigan",Condiments,Labeler,4741,3650,4547,1803,0.0313,0.6089,282,522800,1511.32,44268.05
222
+ EQ-LAN-220,"Lansing, Michigan",Condiments,Labeler,3790,3703,3245,802,0.0311,0.7252,61,117475,1463.44,38348.38
223
+ EQ-LAN-221,"Lansing, Michigan",Sauces,Pasteurizer,4626,3677,4613,2276,0.0539,0.7981,339,477832,1788.32,24445.26
224
+ EQ-LAN-222,"Lansing, Michigan",Sauces,Boiler,1633,1227,1414,1090,0.0348,0.6528,187,111555,1867.65,48236.09
225
+ EQ-LAN-223,"Lansing, Michigan",Canned Vegetables,Filler,4694,4443,4628,2907,0.0881,0.7385,399,175595,1406.86,11706.46
226
+ EQ-LAN-224,"Lansing, Michigan",Sauces,Pasteurizer,4045,3473,3962,1491,0.0411,0.6003,330,531795,1472.0,42763.8
227
+ EQ-LAN-225,"Lansing, Michigan",Sauces,Pasteurizer,4035,3501,3943,1472,0.0903,0.7346,117,228288,1068.91,37579.72
228
+ EQ-LAN-226,"Lansing, Michigan",Condiments,Pasteurizer,3015,2463,2873,1739,0.0931,0.7511,209,545970,1832.75,28053.08
229
+ EQ-LAN-227,"Lansing, Michigan",Canned Vegetables,Filler,1579,1539,1490,2623,0.0512,0.7483,467,398744,1610.49,22824.18
230
+ EQ-LAN-228,"Lansing, Michigan",Sauces,Mixer,2893,2310,2688,860,0.0807,0.8174,136,565403,956.22,10718.52
231
+ EQ-LAN-229,"Lansing, Michigan",Canned Vegetables,Mixer,3151,2336,2807,2319,0.0987,0.6546,463,302138,1105.61,46258.43
232
+ EQ-LAN-230,"Lansing, Michigan",Condiments,Mixer,4624,4104,4325,2537,0.0953,0.6757,295,162164,1404.78,5599.45
233
+ EQ-LAN-231,"Lansing, Michigan",Condiments,Labeler,3264,2812,2927,2911,0.0663,0.7401,216,432130,1778.69,17566.41
234
+ EQ-LAN-232,"Lansing, Michigan",Canned Vegetables,Pasteurizer,3737,3155,3374,1864,0.0657,0.6966,325,295501,1257.07,13826.5
235
+ EQ-LAN-233,"Lansing, Michigan",Condiments,Pasteurizer,4704,3839,4099,1646,0.0782,0.6999,112,317221,1446.65,8730.85
236
+ EQ-LAN-234,"Lansing, Michigan",Canned Vegetables,Conveyor,2692,2567,2338,1015,0.0818,0.6388,484,432807,1988.15,42576.24
237
+ EQ-LAN-235,"Lansing, Michigan",Sauces,Labeler,4828,3597,4800,2821,0.0895,0.4827,110,369443,821.3,43740.22
238
+ EQ-LAN-236,"Lansing, Michigan",Condiments,Mixer,3873,2740,3368,1241,0.0907,0.8046,347,399573,1143.52,25467.41
239
+ EQ-LAN-237,"Lansing, Michigan",Condiments,Pasteurizer,2247,1633,2207,2106,0.0788,0.7166,330,116069,1691.58,13382.57
240
+ EQ-LAN-238,"Lansing, Michigan",Canned Vegetables,Pasteurizer,3951,3681,3614,2157,0.0599,0.6287,72,322586,1306.43,26885.86
241
+ EQ-LAN-239,"Lansing, Michigan",Sauces,Sealer,3147,2772,2739,629,0.0949,0.7208,86,557432,873.82,5069.16
242
+ EQ-LAN-240,"Lansing, Michigan",Canned Vegetables,Boiler,2376,2157,2321,934,0.0312,0.7704,261,349392,1743.02,38882.08
243
+ EQ-LAN-241,"Lansing, Michigan",Sauces,Filler,2159,2075,2137,856,0.0527,0.7439,316,593447,1539.16,11026.96
244
+ EQ-LAN-242,"Lansing, Michigan",Canned Vegetables,Sealer,4598,3853,4219,954,0.0619,0.6473,72,155915,1590.04,28452.46
245
+ EQ-LAN-243,"Lansing, Michigan",Condiments,Filler,1826,1757,1629,1273,0.0997,0.774,152,298422,1027.53,15363.18
246
+ EQ-LAN-244,"Lansing, Michigan",Condiments,Pasteurizer,4181,3499,4076,2683,0.0423,0.7268,408,249611,1354.55,45509.7
247
+ EQ-LAN-245,"Lansing, Michigan",Sauces,Pasteurizer,4595,4210,4076,2158,0.0854,0.6067,190,494692,1369.76,44620.7
248
+ EQ-LAN-246,"Lansing, Michigan",Sauces,Pasteurizer,4833,4505,4603,944,0.0538,0.7638,118,589533,1180.68,23328.61
249
+ EQ-LAN-247,"Lansing, Michigan",Condiments,Pasteurizer,3828,3371,3594,1225,0.0822,0.7682,250,337112,1430.41,40689.78
250
+ EQ-LAN-248,"Lansing, Michigan",Sauces,Labeler,3493,3215,3082,1536,0.0944,0.7901,197,476777,1148.69,33674.61
harfeast_world/data/oee_assumptions.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ plant,current_annual_oee,annual_oee_improvement,investment_start_year,world_class_oee_target
2
+ "Rockford, Illinois",0.7961,0.0315,2025,0.85
3
+ "Madison, Wisconsin",0.7581,0.027,2025,0.85
4
+ "Davenport, Iowa",0.812,0.0324,2025,0.85
5
+ "Columbus, Ohio",0.7201,0.024,2026,0.85
6
+ "Lansing, Michigan",0.7283,0.0249,2026,0.85
harfeast_world/data/plant_labor.csv ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ employee_id,plant,role,hourly_wage,annual_hours,union_status,supervisor_type,census_division
2
+ LAB-ROC-000,"Rockford, Illinois",Production Supervisor,19.11,2080,Union,production,East North Central
3
+ LAB-ROC-001,"Rockford, Illinois",Production Operator,16.56,2080,Union,non-production,East North Central
4
+ LAB-ROC-002,"Rockford, Illinois",Packaging Operator,15.82,2080,Union,non-production,East North Central
5
+ LAB-ROC-003,"Rockford, Illinois",Quality Inspector,19.98,2080,Union,non-production,East North Central
6
+ LAB-ROC-004,"Rockford, Illinois",Production Supervisor,21.76,2080,Non-Union,production,East North Central
7
+ LAB-ROC-005,"Rockford, Illinois",Production Supervisor,22.72,2080,Non-Union,production,East North Central
8
+ LAB-ROC-006,"Rockford, Illinois",Line Lead,22.09,2080,Non-Union,production,East North Central
9
+ LAB-ROC-007,"Rockford, Illinois",Production Operator,16.54,2080,Union,non-production,East North Central
10
+ LAB-ROC-008,"Rockford, Illinois",Packaging Operator,19.77,2080,Union,non-production,East North Central
11
+ LAB-ROC-009,"Rockford, Illinois",Line Lead,21.69,2080,Non-Union,production,East North Central
12
+ LAB-ROC-010,"Rockford, Illinois",Production Operator,21.09,2080,Non-Union,non-production,East North Central
13
+ LAB-ROC-011,"Rockford, Illinois",Production Supervisor,22.04,2080,Non-Union,production,East North Central
14
+ LAB-ROC-012,"Rockford, Illinois",Maintenance Tech,15.74,2080,Union,non-production,East North Central
15
+ LAB-ROC-013,"Rockford, Illinois",Quality Inspector,18.36,2080,Union,non-production,East North Central
16
+ LAB-ROC-014,"Rockford, Illinois",Packaging Operator,19.88,2080,Non-Union,non-production,East North Central
17
+ LAB-ROC-015,"Rockford, Illinois",Line Lead,21.54,2080,Non-Union,production,East North Central
18
+ LAB-ROC-016,"Rockford, Illinois",Quality Inspector,18.03,2080,Union,non-production,East North Central
19
+ LAB-MAD-017,"Madison, Wisconsin",Quality Inspector,18.28,2080,Union,non-production,East North Central
20
+ LAB-MAD-018,"Madison, Wisconsin",Line Lead,19.5,2080,Union,production,East North Central
21
+ LAB-MAD-019,"Madison, Wisconsin",Quality Inspector,17.31,2080,Non-Union,non-production,East North Central
22
+ LAB-MAD-020,"Madison, Wisconsin",Line Lead,22.87,2080,Non-Union,production,East North Central
23
+ LAB-MAD-021,"Madison, Wisconsin",Production Operator,15.01,2080,Non-Union,non-production,East North Central
24
+ LAB-MAD-022,"Madison, Wisconsin",Line Lead,25.0,2080,Non-Union,production,East North Central
25
+ LAB-MAD-023,"Madison, Wisconsin",Maintenance Tech,16.84,2080,Non-Union,non-production,East North Central
26
+ LAB-MAD-024,"Madison, Wisconsin",Packaging Operator,18.11,2080,Non-Union,non-production,East North Central
27
+ LAB-MAD-025,"Madison, Wisconsin",Maintenance Tech,15.44,2080,Non-Union,non-production,East North Central
28
+ LAB-MAD-026,"Madison, Wisconsin",Production Operator,19.57,2080,Non-Union,non-production,East North Central
29
+ LAB-MAD-027,"Madison, Wisconsin",Packaging Operator,19.41,2080,Union,non-production,East North Central
30
+ LAB-MAD-028,"Madison, Wisconsin",Production Operator,19.97,2080,Union,non-production,East North Central
31
+ LAB-MAD-029,"Madison, Wisconsin",Line Lead,25.06,2080,Union,production,East North Central
32
+ LAB-MAD-030,"Madison, Wisconsin",Line Lead,20.99,2080,Union,production,East North Central
33
+ LAB-MAD-031,"Madison, Wisconsin",Line Lead,19.98,2080,Non-Union,production,East North Central
34
+ LAB-MAD-032,"Madison, Wisconsin",Maintenance Tech,18.25,2080,Union,non-production,East North Central
35
+ LAB-MAD-033,"Madison, Wisconsin",Packaging Operator,16.9,2080,Union,non-production,East North Central
36
+ LAB-MAD-034,"Madison, Wisconsin",Maintenance Tech,15.75,2080,Union,non-production,East North Central
37
+ LAB-MAD-035,"Madison, Wisconsin",Production Operator,22.33,2080,Non-Union,non-production,East North Central
38
+ LAB-MAD-036,"Madison, Wisconsin",Production Supervisor,22.04,2080,Non-Union,production,East North Central
39
+ LAB-MAD-037,"Madison, Wisconsin",Maintenance Tech,17.95,2080,Union,non-production,East North Central
40
+ LAB-MAD-038,"Madison, Wisconsin",Production Supervisor,24.14,2080,Non-Union,production,East North Central
41
+ LAB-MAD-039,"Madison, Wisconsin",Production Supervisor,22.68,2080,Non-Union,production,East North Central
42
+ LAB-MAD-040,"Madison, Wisconsin",Production Supervisor,23.85,2080,Non-Union,production,East North Central
43
+ LAB-MAD-041,"Madison, Wisconsin",Quality Inspector,19.34,2080,Non-Union,non-production,East North Central
44
+ LAB-DAV-042,"Davenport, Iowa",Maintenance Tech,17.43,2080,Non-Union,non-production,West North Central
45
+ LAB-DAV-043,"Davenport, Iowa",Quality Inspector,18.72,2080,Union,non-production,West North Central
46
+ LAB-DAV-044,"Davenport, Iowa",Quality Inspector,20.17,2080,Non-Union,non-production,West North Central
47
+ LAB-DAV-045,"Davenport, Iowa",Quality Inspector,15.23,2080,Union,non-production,West North Central
48
+ LAB-DAV-046,"Davenport, Iowa",Maintenance Tech,17.97,2080,Union,non-production,West North Central
49
+ LAB-DAV-047,"Davenport, Iowa",Quality Inspector,18.32,2080,Non-Union,non-production,West North Central
50
+ LAB-DAV-048,"Davenport, Iowa",Quality Inspector,19.62,2080,Non-Union,non-production,West North Central
51
+ LAB-DAV-049,"Davenport, Iowa",Maintenance Tech,22.89,2080,Union,non-production,West North Central
52
+ LAB-DAV-050,"Davenport, Iowa",Line Lead,23.6,2080,Non-Union,production,West North Central
53
+ LAB-DAV-051,"Davenport, Iowa",Production Operator,16.8,2080,Non-Union,non-production,West North Central
54
+ LAB-DAV-052,"Davenport, Iowa",Production Supervisor,24.57,2080,Union,production,West North Central
55
+ LAB-DAV-053,"Davenport, Iowa",Production Supervisor,24.25,2080,Non-Union,production,West North Central
56
+ LAB-DAV-054,"Davenport, Iowa",Quality Inspector,18.38,2080,Union,non-production,West North Central
57
+ LAB-DAV-055,"Davenport, Iowa",Maintenance Tech,16.36,2080,Non-Union,non-production,West North Central
58
+ LAB-DAV-056,"Davenport, Iowa",Quality Inspector,20.1,2080,Non-Union,non-production,West North Central
59
+ LAB-DAV-057,"Davenport, Iowa",Production Supervisor,24.45,2080,Non-Union,production,West North Central
60
+ LAB-DAV-058,"Davenport, Iowa",Line Lead,22.25,2080,Union,production,West North Central
61
+ LAB-DAV-059,"Davenport, Iowa",Line Lead,19.98,2080,Non-Union,production,West North Central
62
+ LAB-DAV-060,"Davenport, Iowa",Production Supervisor,21.25,2080,Union,production,West North Central
63
+ LAB-DAV-061,"Davenport, Iowa",Production Supervisor,20.75,2080,Non-Union,production,West North Central
64
+ LAB-DAV-062,"Davenport, Iowa",Packaging Operator,18.53,2080,Non-Union,non-production,West North Central
65
+ LAB-DAV-063,"Davenport, Iowa",Line Lead,20.39,2080,Non-Union,production,West North Central
harfeast_world/data/plant_unit_sales.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ plant,current_unit_sales,price_per_unit
2
+ "Rockford, Illinois",18641894,2.96
3
+ "Madison, Wisconsin",20199169,3.23
4
+ "Davenport, Iowa",4427466,5.88
5
+ "Columbus, Ohio",4524679,6.84
6
+ "Lansing, Michigan",5973122,5.98
harfeast_world/data/quality_losses.csv ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ equipment_id,plant,product_family,scrap_cost,unplanned_failure_cost
2
+ EQ-ROC-000,"Rockford, Illinois",Canned Vegetables,31790.64,29387.98
3
+ EQ-ROC-001,"Rockford, Illinois",Canned Vegetables,20402.78,18401.71
4
+ EQ-ROC-002,"Rockford, Illinois",Sauces,38722.36,21719.37
5
+ EQ-ROC-003,"Rockford, Illinois",Canned Vegetables,37484.45,3810.01
6
+ EQ-ROC-004,"Rockford, Illinois",Canned Vegetables,56697.98,21945.58
7
+ EQ-ROC-005,"Rockford, Illinois",Canned Vegetables,29791.99,11388.85
8
+ EQ-ROC-006,"Rockford, Illinois",Canned Vegetables,33428.53,14270.37
9
+ EQ-ROC-007,"Rockford, Illinois",Condiments,56498.45,8430.38
10
+ EQ-ROC-008,"Rockford, Illinois",Condiments,19966.85,19711.6
11
+ EQ-ROC-009,"Rockford, Illinois",Sauces,28268.19,28914.53
12
+ EQ-ROC-010,"Rockford, Illinois",Condiments,18875.12,3446.45
13
+ EQ-ROC-011,"Rockford, Illinois",Condiments,40780.15,19325.64
14
+ EQ-ROC-012,"Rockford, Illinois",Canned Vegetables,26403.39,14897.54
15
+ EQ-ROC-013,"Rockford, Illinois",Sauces,31036.53,15841.22
16
+ EQ-ROC-014,"Rockford, Illinois",Sauces,4638.0,17986.52
17
+ EQ-ROC-015,"Rockford, Illinois",Condiments,15106.58,5853.9
18
+ EQ-ROC-016,"Rockford, Illinois",Sauces,21517.03,20891.14
19
+ EQ-ROC-017,"Rockford, Illinois",Canned Vegetables,8658.05,16145.08
20
+ EQ-ROC-018,"Rockford, Illinois",Sauces,16590.12,23894.26
21
+ EQ-ROC-019,"Rockford, Illinois",Condiments,68138.08,12614.04
22
+ EQ-ROC-020,"Rockford, Illinois",Condiments,12546.28,25257.7
23
+ EQ-ROC-021,"Rockford, Illinois",Condiments,12887.67,10177.67
24
+ EQ-ROC-022,"Rockford, Illinois",Condiments,67275.47,28878.23
25
+ EQ-ROC-023,"Rockford, Illinois",Canned Vegetables,34567.88,28422.34
26
+ EQ-ROC-024,"Rockford, Illinois",Condiments,31107.27,26580.05
27
+ EQ-ROC-025,"Rockford, Illinois",Condiments,105259.13,21011.06
28
+ EQ-ROC-026,"Rockford, Illinois",Sauces,43381.11,17965.38
29
+ EQ-ROC-027,"Rockford, Illinois",Sauces,16850.46,21824.0
30
+ EQ-ROC-028,"Rockford, Illinois",Sauces,11566.39,17510.75
31
+ EQ-ROC-029,"Rockford, Illinois",Canned Vegetables,39470.4,19686.56
32
+ EQ-ROC-030,"Rockford, Illinois",Condiments,83646.9,24682.54
33
+ EQ-ROC-031,"Rockford, Illinois",Condiments,26324.64,24240.34
34
+ EQ-ROC-032,"Rockford, Illinois",Condiments,6708.68,2569.11
35
+ EQ-ROC-033,"Rockford, Illinois",Sauces,46945.21,17403.94
36
+ EQ-ROC-034,"Rockford, Illinois",Sauces,28373.71,20054.52
37
+ EQ-ROC-035,"Rockford, Illinois",Condiments,30724.18,2650.76
38
+ EQ-ROC-036,"Rockford, Illinois",Sauces,18942.1,2844.43
39
+ EQ-ROC-037,"Rockford, Illinois",Sauces,8116.98,22785.94
40
+ EQ-ROC-038,"Rockford, Illinois",Sauces,42366.05,6981.36
41
+ EQ-ROC-039,"Rockford, Illinois",Sauces,8705.85,24875.92
42
+ EQ-ROC-040,"Rockford, Illinois",Canned Vegetables,35686.55,20698.54
43
+ EQ-ROC-041,"Rockford, Illinois",Canned Vegetables,17760.6,18692.22
44
+ EQ-ROC-042,"Rockford, Illinois",Condiments,5164.13,13063.15
45
+ EQ-ROC-043,"Rockford, Illinois",Canned Vegetables,52883.11,10952.63
46
+ EQ-ROC-044,"Rockford, Illinois",Canned Vegetables,60853.62,11010.72
47
+ EQ-ROC-045,"Rockford, Illinois",Condiments,10077.72,11054.16
48
+ EQ-ROC-046,"Rockford, Illinois",Condiments,25243.64,11525.45
49
+ EQ-ROC-047,"Rockford, Illinois",Sauces,53258.94,9094.03
50
+ EQ-ROC-048,"Rockford, Illinois",Sauces,34100.15,18315.35
51
+ EQ-ROC-049,"Rockford, Illinois",Sauces,8113.37,23133.16
52
+ EQ-ROC-050,"Rockford, Illinois",Sauces,60318.04,3196.85
53
+ EQ-ROC-051,"Rockford, Illinois",Canned Vegetables,47966.69,21579.97
54
+ EQ-ROC-052,"Rockford, Illinois",Condiments,21523.47,14468.54
55
+ EQ-ROC-053,"Rockford, Illinois",Canned Vegetables,27241.2,19709.06
56
+ EQ-ROC-054,"Rockford, Illinois",Sauces,31772.53,23966.07
57
+ EQ-MAD-055,"Madison, Wisconsin",Condiments,35375.9,29744.95
58
+ EQ-MAD-056,"Madison, Wisconsin",Condiments,36530.58,2392.57
59
+ EQ-MAD-057,"Madison, Wisconsin",Condiments,46736.15,16315.09
60
+ EQ-MAD-058,"Madison, Wisconsin",Sauces,19948.28,2043.82
61
+ EQ-MAD-059,"Madison, Wisconsin",Sauces,9401.76,9424.73
62
+ EQ-MAD-060,"Madison, Wisconsin",Condiments,38057.95,28321.99
63
+ EQ-MAD-061,"Madison, Wisconsin",Canned Vegetables,6640.5,16415.64
64
+ EQ-MAD-062,"Madison, Wisconsin",Sauces,48758.09,15544.93
65
+ EQ-MAD-063,"Madison, Wisconsin",Canned Vegetables,34166.93,7135.0
66
+ EQ-MAD-064,"Madison, Wisconsin",Condiments,74212.78,6385.54
67
+ EQ-MAD-065,"Madison, Wisconsin",Sauces,16719.86,6885.87
68
+ EQ-MAD-066,"Madison, Wisconsin",Canned Vegetables,29831.21,26018.63
69
+ EQ-MAD-067,"Madison, Wisconsin",Sauces,45622.02,27194.6
70
+ EQ-MAD-068,"Madison, Wisconsin",Condiments,23468.11,10203.95
71
+ EQ-MAD-069,"Madison, Wisconsin",Sauces,23781.43,29326.74
72
+ EQ-MAD-070,"Madison, Wisconsin",Canned Vegetables,24404.31,21383.4
73
+ EQ-MAD-071,"Madison, Wisconsin",Canned Vegetables,32243.91,28840.93
74
+ EQ-MAD-072,"Madison, Wisconsin",Sauces,15104.15,13859.1
75
+ EQ-MAD-073,"Madison, Wisconsin",Condiments,10152.5,21705.57
76
+ EQ-MAD-074,"Madison, Wisconsin",Condiments,39546.52,5598.24
77
+ EQ-MAD-075,"Madison, Wisconsin",Canned Vegetables,51295.28,10327.51
78
+ EQ-MAD-076,"Madison, Wisconsin",Sauces,23314.1,16455.68
79
+ EQ-MAD-077,"Madison, Wisconsin",Condiments,9480.05,25897.46
80
+ EQ-MAD-078,"Madison, Wisconsin",Sauces,20658.74,28595.82
81
+ EQ-MAD-079,"Madison, Wisconsin",Canned Vegetables,26886.2,11669.27
82
+ EQ-MAD-080,"Madison, Wisconsin",Condiments,16461.55,14785.87
83
+ EQ-MAD-081,"Madison, Wisconsin",Condiments,30961.79,18417.06
84
+ EQ-MAD-082,"Madison, Wisconsin",Sauces,38231.41,6348.08
85
+ EQ-MAD-083,"Madison, Wisconsin",Canned Vegetables,37179.57,11292.85
86
+ EQ-MAD-084,"Madison, Wisconsin",Condiments,56440.7,7041.16
87
+ EQ-MAD-085,"Madison, Wisconsin",Sauces,7434.44,8080.01
88
+ EQ-MAD-086,"Madison, Wisconsin",Condiments,50943.23,27482.39
89
+ EQ-MAD-087,"Madison, Wisconsin",Canned Vegetables,34157.04,7758.81
90
+ EQ-MAD-088,"Madison, Wisconsin",Condiments,42638.65,28404.21
91
+ EQ-MAD-089,"Madison, Wisconsin",Canned Vegetables,15155.77,11171.31
92
+ EQ-MAD-090,"Madison, Wisconsin",Canned Vegetables,18468.85,28447.31
93
+ EQ-MAD-091,"Madison, Wisconsin",Canned Vegetables,24661.57,14972.28
94
+ EQ-MAD-092,"Madison, Wisconsin",Sauces,7338.8,20914.37
95
+ EQ-MAD-093,"Madison, Wisconsin",Sauces,19202.95,21035.68
96
+ EQ-MAD-094,"Madison, Wisconsin",Condiments,23735.26,15935.78
97
+ EQ-MAD-095,"Madison, Wisconsin",Sauces,11380.08,7954.92
98
+ EQ-MAD-096,"Madison, Wisconsin",Canned Vegetables,35034.17,27289.62
99
+ EQ-MAD-097,"Madison, Wisconsin",Canned Vegetables,21598.11,24036.45
100
+ EQ-MAD-098,"Madison, Wisconsin",Canned Vegetables,52335.19,5642.27
101
+ EQ-MAD-099,"Madison, Wisconsin",Sauces,13828.92,24376.65
102
+ EQ-MAD-100,"Madison, Wisconsin",Sauces,15400.12,23533.11
103
+ EQ-MAD-101,"Madison, Wisconsin",Sauces,26147.79,16896.91
104
+ EQ-MAD-102,"Madison, Wisconsin",Sauces,7171.97,15498.98
105
+ EQ-MAD-103,"Madison, Wisconsin",Condiments,23386.58,3346.47
106
+ EQ-MAD-104,"Madison, Wisconsin",Sauces,12513.63,20637.24
107
+ EQ-MAD-105,"Madison, Wisconsin",Condiments,14468.0,2254.1
108
+ EQ-MAD-106,"Madison, Wisconsin",Condiments,31963.44,21848.06
109
+ EQ-MAD-107,"Madison, Wisconsin",Sauces,85677.72,18528.09
110
+ EQ-MAD-108,"Madison, Wisconsin",Condiments,26334.74,18779.43
111
+ EQ-DAV-109,"Davenport, Iowa",Sauces,29366.56,9189.3
112
+ EQ-DAV-110,"Davenport, Iowa",Condiments,28200.98,15168.75
113
+ EQ-DAV-111,"Davenport, Iowa",Condiments,16039.14,15366.73
114
+ EQ-DAV-112,"Davenport, Iowa",Canned Vegetables,30842.01,19627.35
115
+ EQ-DAV-113,"Davenport, Iowa",Condiments,33834.74,21200.25
116
+ EQ-DAV-114,"Davenport, Iowa",Canned Vegetables,42787.2,7105.18
117
+ EQ-DAV-115,"Davenport, Iowa",Condiments,11761.68,16280.13
118
+ EQ-DAV-116,"Davenport, Iowa",Condiments,46685.24,28755.99
119
+ EQ-DAV-117,"Davenport, Iowa",Canned Vegetables,11739.11,13562.2
120
+ EQ-DAV-118,"Davenport, Iowa",Sauces,65432.44,28330.38
121
+ EQ-DAV-119,"Davenport, Iowa",Canned Vegetables,49663.25,29755.97
122
+ EQ-DAV-120,"Davenport, Iowa",Condiments,31206.84,18697.24
123
+ EQ-DAV-121,"Davenport, Iowa",Sauces,11800.12,15478.34
124
+ EQ-DAV-122,"Davenport, Iowa",Canned Vegetables,73517.99,9397.61
125
+ EQ-DAV-123,"Davenport, Iowa",Condiments,16157.66,13041.19
126
+ EQ-DAV-124,"Davenport, Iowa",Sauces,17422.95,26846.91
127
+ EQ-DAV-125,"Davenport, Iowa",Sauces,54968.43,29421.73
128
+ EQ-DAV-126,"Davenport, Iowa",Condiments,7552.45,18918.94
129
+ EQ-DAV-127,"Davenport, Iowa",Canned Vegetables,40570.28,15931.01
130
+ EQ-DAV-128,"Davenport, Iowa",Condiments,50571.46,20718.17
131
+ EQ-DAV-129,"Davenport, Iowa",Condiments,21018.99,15287.88
132
+ EQ-DAV-130,"Davenport, Iowa",Sauces,65311.0,24222.33
133
+ EQ-DAV-131,"Davenport, Iowa",Sauces,30283.82,27141.86
134
+ EQ-DAV-132,"Davenport, Iowa",Sauces,18730.34,22999.45
135
+ EQ-DAV-133,"Davenport, Iowa",Condiments,23535.33,14864.98
136
+ EQ-DAV-134,"Davenport, Iowa",Condiments,41317.42,23019.56
137
+ EQ-DAV-135,"Davenport, Iowa",Sauces,32833.35,6767.79
138
+ EQ-DAV-136,"Davenport, Iowa",Condiments,35239.0,23505.3
139
+ EQ-DAV-137,"Davenport, Iowa",Sauces,18636.25,12311.06
140
+ EQ-DAV-138,"Davenport, Iowa",Condiments,29671.48,13849.47
141
+ EQ-DAV-139,"Davenport, Iowa",Canned Vegetables,30098.89,20380.14
142
+ EQ-DAV-140,"Davenport, Iowa",Sauces,32473.23,8859.44
143
+ EQ-DAV-141,"Davenport, Iowa",Canned Vegetables,54768.25,29707.94
144
+ EQ-DAV-142,"Davenport, Iowa",Sauces,65133.57,3411.7
145
+ EQ-DAV-143,"Davenport, Iowa",Condiments,38132.75,6301.9
146
+ EQ-DAV-144,"Davenport, Iowa",Sauces,50061.84,23200.72
147
+ EQ-DAV-145,"Davenport, Iowa",Condiments,32513.62,12832.01
148
+ EQ-DAV-146,"Davenport, Iowa",Sauces,16894.25,29689.81
149
+ EQ-DAV-147,"Davenport, Iowa",Canned Vegetables,36439.14,27680.31
150
+ EQ-DAV-148,"Davenport, Iowa",Condiments,38786.65,12408.02
151
+ EQ-DAV-149,"Davenport, Iowa",Canned Vegetables,19225.65,6043.42
152
+ EQ-DAV-150,"Davenport, Iowa",Sauces,28058.2,12147.4
153
+ EQ-DAV-151,"Davenport, Iowa",Condiments,17660.16,16896.53
154
+ EQ-DAV-152,"Davenport, Iowa",Canned Vegetables,46828.31,22374.11
155
+ EQ-DAV-153,"Davenport, Iowa",Canned Vegetables,30533.23,26904.7
156
+ EQ-DAV-154,"Davenport, Iowa",Canned Vegetables,31527.84,15709.37
157
+ EQ-COL-155,"Columbus, Ohio",Canned Vegetables,27724.92,10964.93
158
+ EQ-COL-156,"Columbus, Ohio",Canned Vegetables,36428.57,2294.56
159
+ EQ-COL-157,"Columbus, Ohio",Canned Vegetables,63614.77,29148.39
160
+ EQ-COL-158,"Columbus, Ohio",Sauces,16823.6,23259.57
161
+ EQ-COL-159,"Columbus, Ohio",Canned Vegetables,14331.86,22891.29
162
+ EQ-COL-160,"Columbus, Ohio",Sauces,17622.95,20513.22
163
+ EQ-COL-161,"Columbus, Ohio",Canned Vegetables,35971.76,6924.01
164
+ EQ-COL-162,"Columbus, Ohio",Condiments,30643.25,21280.04
165
+ EQ-COL-163,"Columbus, Ohio",Sauces,74298.54,20268.84
166
+ EQ-COL-164,"Columbus, Ohio",Sauces,19677.24,10330.98
167
+ EQ-COL-165,"Columbus, Ohio",Canned Vegetables,19238.38,14915.05
168
+ EQ-COL-166,"Columbus, Ohio",Canned Vegetables,22137.81,12808.2
169
+ EQ-COL-167,"Columbus, Ohio",Canned Vegetables,78489.93,28524.04
170
+ EQ-COL-168,"Columbus, Ohio",Condiments,35066.23,13625.57
171
+ EQ-COL-169,"Columbus, Ohio",Sauces,17082.06,20996.01
172
+ EQ-COL-170,"Columbus, Ohio",Canned Vegetables,11732.67,28309.23
173
+ EQ-COL-171,"Columbus, Ohio",Canned Vegetables,17851.45,7135.59
174
+ EQ-COL-172,"Columbus, Ohio",Condiments,49383.51,12608.57
175
+ EQ-COL-173,"Columbus, Ohio",Condiments,10241.38,9288.63
176
+ EQ-COL-174,"Columbus, Ohio",Canned Vegetables,13215.28,21534.74
177
+ EQ-COL-175,"Columbus, Ohio",Condiments,22045.43,24082.9
178
+ EQ-COL-176,"Columbus, Ohio",Condiments,29704.97,7729.07
179
+ EQ-COL-177,"Columbus, Ohio",Canned Vegetables,52711.79,13336.74
180
+ EQ-COL-178,"Columbus, Ohio",Condiments,70575.32,28097.0
181
+ EQ-COL-179,"Columbus, Ohio",Sauces,15295.45,5004.87
182
+ EQ-COL-180,"Columbus, Ohio",Condiments,26622.77,18047.1
183
+ EQ-COL-181,"Columbus, Ohio",Sauces,41614.12,6929.53
184
+ EQ-COL-182,"Columbus, Ohio",Condiments,17874.46,28937.99
185
+ EQ-COL-183,"Columbus, Ohio",Condiments,44947.84,14359.13
186
+ EQ-COL-184,"Columbus, Ohio",Canned Vegetables,54594.85,13044.96
187
+ EQ-COL-185,"Columbus, Ohio",Canned Vegetables,39808.28,10347.17
188
+ EQ-COL-186,"Columbus, Ohio",Condiments,25966.99,8243.02
189
+ EQ-COL-187,"Columbus, Ohio",Canned Vegetables,32364.69,26289.68
190
+ EQ-COL-188,"Columbus, Ohio",Canned Vegetables,19798.23,3071.0
191
+ EQ-COL-189,"Columbus, Ohio",Canned Vegetables,20915.75,10914.17
192
+ EQ-COL-190,"Columbus, Ohio",Canned Vegetables,42960.45,19627.32
193
+ EQ-COL-191,"Columbus, Ohio",Sauces,18324.01,25431.51
194
+ EQ-COL-192,"Columbus, Ohio",Condiments,33638.72,13010.96
195
+ EQ-COL-193,"Columbus, Ohio",Sauces,23798.73,29779.54
196
+ EQ-COL-194,"Columbus, Ohio",Sauces,59762.71,8095.94
197
+ EQ-COL-195,"Columbus, Ohio",Canned Vegetables,42229.25,2462.28
198
+ EQ-COL-196,"Columbus, Ohio",Sauces,24947.51,20254.48
199
+ EQ-COL-197,"Columbus, Ohio",Canned Vegetables,18048.5,7623.84
200
+ EQ-COL-198,"Columbus, Ohio",Sauces,13972.73,6062.61
201
+ EQ-COL-199,"Columbus, Ohio",Canned Vegetables,26477.59,27916.26
202
+ EQ-COL-200,"Columbus, Ohio",Sauces,37540.86,24191.18
203
+ EQ-COL-201,"Columbus, Ohio",Sauces,45497.29,3506.87
204
+ EQ-COL-202,"Columbus, Ohio",Canned Vegetables,15417.31,21007.04
205
+ EQ-COL-203,"Columbus, Ohio",Condiments,45671.51,12887.54
206
+ EQ-LAN-204,"Lansing, Michigan",Sauces,46147.3,3581.62
207
+ EQ-LAN-205,"Lansing, Michigan",Canned Vegetables,9365.47,16291.69
208
+ EQ-LAN-206,"Lansing, Michigan",Condiments,19201.81,26363.56
209
+ EQ-LAN-207,"Lansing, Michigan",Sauces,41350.07,29021.52
210
+ EQ-LAN-208,"Lansing, Michigan",Sauces,36899.13,24674.83
211
+ EQ-LAN-209,"Lansing, Michigan",Sauces,10147.56,25711.97
212
+ EQ-LAN-210,"Lansing, Michigan",Sauces,55859.56,3553.24
213
+ EQ-LAN-211,"Lansing, Michigan",Canned Vegetables,24985.48,10412.88
214
+ EQ-LAN-212,"Lansing, Michigan",Condiments,11422.18,27465.95
215
+ EQ-LAN-213,"Lansing, Michigan",Sauces,15358.1,5026.64
216
+ EQ-LAN-214,"Lansing, Michigan",Canned Vegetables,25015.36,7725.49
217
+ EQ-LAN-215,"Lansing, Michigan",Canned Vegetables,51927.22,20313.01
218
+ EQ-LAN-216,"Lansing, Michigan",Canned Vegetables,8604.55,23424.14
219
+ EQ-LAN-217,"Lansing, Michigan",Canned Vegetables,25422.22,26975.86
220
+ EQ-LAN-218,"Lansing, Michigan",Canned Vegetables,29017.01,7471.52
221
+ EQ-LAN-219,"Lansing, Michigan",Condiments,24730.7,28101.29
222
+ EQ-LAN-220,"Lansing, Michigan",Condiments,5346.64,22425.16
223
+ EQ-LAN-221,"Lansing, Michigan",Sauces,46058.44,22961.49
224
+ EQ-LAN-222,"Lansing, Michigan",Sauces,7250.43,29820.86
225
+ EQ-LAN-223,"Lansing, Michigan",Canned Vegetables,21764.01,19667.95
226
+ EQ-LAN-224,"Lansing, Michigan",Sauces,32173.17,18921.86
227
+ EQ-LAN-225,"Lansing, Michigan",Sauces,22034.95,12216.93
228
+ EQ-LAN-226,"Lansing, Michigan",Condiments,93158.33,6482.56
229
+ EQ-LAN-227,"Lansing, Michigan",Canned Vegetables,32879.27,27813.58
230
+ EQ-LAN-228,"Lansing, Michigan",Sauces,43630.43,26831.48
231
+ EQ-LAN-229,"Lansing, Michigan",Canned Vegetables,32970.42,23431.81
232
+ EQ-LAN-230,"Lansing, Michigan",Condiments,21709.79,18679.52
233
+ EQ-LAN-231,"Lansing, Michigan",Condiments,50959.86,18260.86
234
+ EQ-LAN-232,"Lansing, Michigan",Canned Vegetables,24405.28,17235.51
235
+ EQ-LAN-233,"Lansing, Michigan",Condiments,35886.59,14338.67
236
+ EQ-LAN-234,"Lansing, Michigan",Canned Vegetables,70387.69,15126.77
237
+ EQ-LAN-235,"Lansing, Michigan",Sauces,27156.41,3713.3
238
+ EQ-LAN-236,"Lansing, Michigan",Condiments,41442.62,12095.18
239
+ EQ-LAN-237,"Lansing, Michigan",Condiments,15471.59,16471.21
240
+ EQ-LAN-238,"Lansing, Michigan",Canned Vegetables,25244.02,18373.57
241
+ EQ-LAN-239,"Lansing, Michigan",Sauces,46225.34,23568.36
242
+ EQ-LAN-240,"Lansing, Michigan",Canned Vegetables,19000.71,15277.69
243
+ EQ-LAN-241,"Lansing, Michigan",Sauces,48136.7,29599.36
244
+ EQ-LAN-242,"Lansing, Michigan",Canned Vegetables,15345.7,21415.06
245
+ EQ-LAN-243,"Lansing, Michigan",Condiments,30571.76,27758.14
246
+ EQ-LAN-244,"Lansing, Michigan",Condiments,14302.08,12757.06
247
+ EQ-LAN-245,"Lansing, Michigan",Sauces,57867.84,23436.2
248
+ EQ-LAN-246,"Lansing, Michigan",Sauces,37447.48,28052.55
249
+ EQ-LAN-247,"Lansing, Michigan",Condiments,39637.53,11118.43
250
+ EQ-LAN-248,"Lansing, Michigan",Sauces,51699.95,21706.49
harfeast_world/documents/aptean_report.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Aptean Food & Beverage Manufacturing Technology Report 2024
2
+ ============================================================
3
+
4
+ Top Technology Investments and Revenue Impact Analysis
5
+
6
+ Technology Users Growth Non-Users Growth Category
7
+ --------------------------------------------------------------------------------------
8
+ IoT Sensors 12.4% 4.0% Top Investment to Date
9
+ Predictive Maintenance 11.8% 3.9% Top Planned 2024
10
+ Cloud ERP 9.2% 5.1% Top Investment to Date
11
+ Robotic Automation 10.5% 3.6% Top Planned 2024
12
+ AI Quality Control 8.9% 4.6% Top Investment to Date
13
+ Digital Twin 7.3% 4.0% Other
14
+ Supply Chain AI 6.9% 3.1% Other
15
+ Automated Scheduling 8.1% 5.7% Top Planned 2024
16
+ Warehouse Robotics 7.8% 5.2% Other
17
+ Advanced Analytics 9.6% 4.4% Top Investment to Date
18
+
19
+
20
+ Note: 'Top Investment to Date' and 'Top Planned 2024' represent
21
+ investments explicitly identified by surveyed manufacturers as
22
+ their highest-priority technology initiatives.
harfeast_world/documents/frito_lay_case_study.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Frito-Lay Digital Transformation Case Study
2
+ =============================================
3
+
4
+ Background: Frito-Lay North America, a division of PepsiCo, operates
5
+ over 30 manufacturing facilities producing snack foods including
6
+ Doritos, Cheetos, and Lay's potato chips.
7
+
8
+ Initiative: In 2022, Frito-Lay deployed IoT-based predictive maintenance
9
+ sensors across their manufacturing network, focusing on high-throughput
10
+ production lines.
11
+
12
+ Results: After 18 months of deployment, Frito-Lay achieved a 32%
13
+ reduction in unplanned downtime across all monitored production lines.
14
+ The improvement was consistent across facilities regardless of size
15
+ or product type.
16
+
17
+ Key Success Factors:
18
+ - Phased rollout starting with highest-volume lines
19
+ - Integration with existing SCADA systems
20
+ - Dedicated data analytics team for sensor data interpretation
21
+ - Weekly review cadence with plant managers
22
+
23
+ The 32% unplanned downtime reduction translated to approximately
24
+ $45M in annual cost savings across the network.
harfeast_world/documents/interview_david_chen.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Expert Interview Transcript - David Chen, Director of Manufacturing
2
+ Date: November 16, 2024
3
+
4
+ Q: What digital investment would have the fastest and largest impact
5
+ on HarFeast's profitability?
6
+
7
+ A: "I've been looking at this from an operations standpoint. While
8
+ predictive maintenance is valuable long-term, the immediate winner
9
+ is IoT Sensors for yield optimization. The ability to monitor yield
10
+ in real-time across all product lines gives us immediate visibility
11
+ into where we're losing margin. Other levers like automated scheduling
12
+ help with throughput but don't directly attack gross margin the way
13
+ yield sensing does. IoT Sensors for yield is my top recommendation."
harfeast_world/documents/interview_mike_russo.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Expert Interview Transcript - Mike Russo, Head of Digital Transformation
2
+ Date: November 17, 2024
3
+
4
+ Q: Which digital lever should HarFeast prioritize for the fastest
5
+ margin improvement?
6
+
7
+ A: "After analyzing all the options, I keep coming back to
8
+ IoT Sensors for yield. The ROI timeline is shortest — typically
9
+ 4-8 months to see measurable improvement. Predictive maintenance
10
+ is a close second but has a longer implementation cycle. Cloud ERP
11
+ is foundational but doesn't directly move gross margin in the near
12
+ term. IoT Sensors for yield monitoring is the clear priority if
13
+ we want the fastest and biggest boost to Gross Margin."
harfeast_world/documents/interview_sarah_jenkins.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Expert Interview Transcript - Sarah Jenkins, VP Operations
2
+ Date: November 15, 2024
3
+
4
+ Q: Of the digital levers evaluated, which would deliver the fastest and
5
+ biggest boost to HarFeast's Gross Margin?
6
+
7
+ A: "We've evaluated several options including predictive maintenance,
8
+ automated scheduling, and IoT-based monitoring. In my assessment,
9
+ IoT Sensors for yield monitoring would deliver the fastest and most
10
+ significant boost to our Gross Margin. The real-time data on production
11
+ yield lets us catch quality issues at the source before they cascade
12
+ into scrap. I've seen it work at comparable food manufacturers with
13
+ measurable margin improvement within 6 months of deployment."
harfeast_world/documents/scrap_rate_report.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ HarFeast Food Group - Quality Standards: Scrap Rate Report
2
+ ==========================================================
3
+
4
+ Acceptable scrap rate range: 3.5% - 7.0%
5
+ Target scrap rate (minimum of acceptable range): 3.5%
6
+
7
+ Plants operating above 7.0% require immediate corrective action and
8
+ must submit a remediation plan within 30 days. Quarterly reviews will
9
+ assess progress toward the target rate.
10
+
11
+ The target scrap rate represents the minimum of the acceptable range
12
+ and should be used as the baseline for all cost-of-quality calculations.
harfeast_world/ground_truth.json ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "task1": {
3
+ "high_priority_count": 52,
4
+ "high_priority_pct": 2.1,
5
+ "hp_inefficient_hours": 1068.0,
6
+ "hp_inefficient_pct": 2.2,
7
+ "hp_frontline": 19,
8
+ "hp_backoffice": 17,
9
+ "hp_supervisor": 8,
10
+ "hp_management": 8
11
+ },
12
+ "task2": {
13
+ "Rockford, Illinois": 28559265184,
14
+ "Madison, Wisconsin": 25208027893,
15
+ "Davenport, Iowa": 23644183501,
16
+ "Columbus, Ohio": 25448856089,
17
+ "Lansing, Michigan": 21243014839
18
+ },
19
+ "task3": {
20
+ "Canned Vegetables": {
21
+ "new_scrap_rate_pct": 6.1,
22
+ "units_avoided": 68948
23
+ },
24
+ "Condiments": {
25
+ "new_scrap_rate_pct": 6.1,
26
+ "units_avoided": 84913
27
+ },
28
+ "Sauces": {
29
+ "new_scrap_rate_pct": 6.0,
30
+ "units_avoided": 69887
31
+ }
32
+ },
33
+ "task4": {
34
+ "digital_lever": "IoT Sensors for yield",
35
+ "Rockford, Illinois": {
36
+ "first_year_exceeds": 2027,
37
+ "oee_at_that_year": 0.8591
38
+ },
39
+ "Madison, Wisconsin": {
40
+ "first_year_exceeds": 2029,
41
+ "oee_at_that_year": 0.8661
42
+ },
43
+ "Davenport, Iowa": {
44
+ "first_year_exceeds": 2027,
45
+ "oee_at_that_year": 0.8768
46
+ },
47
+ "Columbus, Ohio": {
48
+ "first_year_exceeds": 2032,
49
+ "oee_at_that_year": 0.8641
50
+ },
51
+ "Lansing, Michigan": {
52
+ "first_year_exceeds": 2031,
53
+ "oee_at_that_year": 0.8528
54
+ }
55
+ },
56
+ "task5": {
57
+ "Rockford, Illinois": {
58
+ "total_labor_cost": 692058,
59
+ "efficiency_gains": 97277,
60
+ "union_demand_increase": 16631
61
+ },
62
+ "Madison, Wisconsin": {
63
+ "total_labor_cost": 1032866,
64
+ "efficiency_gains": 156478,
65
+ "union_demand_increase": 19974
66
+ },
67
+ "Davenport, Iowa": {
68
+ "total_labor_cost": 919381,
69
+ "efficiency_gains": 78062,
70
+ "union_demand_increase": 16771
71
+ }
72
+ },
73
+ "task6": {
74
+ "avg_by_plant": {
75
+ "Rockford, Illinois": 8.7,
76
+ "Madison, Wisconsin": 8.7,
77
+ "Davenport, Iowa": 8.7,
78
+ "Columbus, Ohio": 37.2,
79
+ "Lansing, Michigan": 37.4
80
+ },
81
+ "most_efficient": [
82
+ "Rockford, Illinois",
83
+ "Madison, Wisconsin",
84
+ "Davenport, Iowa"
85
+ ],
86
+ "least_efficient": "Lansing, Michigan",
87
+ "pct_difference": 330
88
+ },
89
+ "task7": {
90
+ "avg_loss_by_role": {
91
+ "Production/Manufacturing Operator": 18964,
92
+ "Quality Control/Quality Assurance": 21948,
93
+ "Maintenance Technician": 25004,
94
+ "Production Supervisor/Team Lead": 28288,
95
+ "Supply Chain/Logistics Coordinator": 22701,
96
+ "Demand Planning/Forecasting": 28129,
97
+ "Administrative/Support Staff": 21637,
98
+ "Plant Management": 41416
99
+ },
100
+ "total_annual_loss": 62914681
101
+ },
102
+ "task8": {
103
+ "hp_quality_losses": 1526123,
104
+ "hp_pct_of_cv_losses": 38
105
+ },
106
+ "task9": {
107
+ "Rockford, Illinois": {
108
+ "variance_hours": 18506,
109
+ "variance_dollars": 350503.64,
110
+ "productivity_index": 0.89
111
+ },
112
+ "Madison, Wisconsin": {
113
+ "variance_hours": 13695,
114
+ "variance_dollars": 259383.3,
115
+ "productivity_index": 0.92
116
+ }
117
+ },
118
+ "task10": {
119
+ "avg_hourly_wage": 29.3,
120
+ "annual_productivity_loss": 73552000
121
+ },
122
+ "task11": {
123
+ "top5_technologies": [
124
+ "IoT Sensors",
125
+ "Predictive Maintenance",
126
+ "Robotic Automation",
127
+ "Advanced Analytics",
128
+ "AI Quality Control"
129
+ ],
130
+ "plant_results": {
131
+ "Rockford, Illinois": {
132
+ "new_unit_sales": 25575169,
133
+ "new_projected_sales": 75702500
134
+ },
135
+ "Madison, Wisconsin": {
136
+ "new_unit_sales": 27711624,
137
+ "new_projected_sales": 89508546
138
+ },
139
+ "Davenport, Iowa": {
140
+ "new_unit_sales": 6074125,
141
+ "new_projected_sales": 35715855
142
+ },
143
+ "Columbus, Ohio": {
144
+ "new_unit_sales": 6207493,
145
+ "new_projected_sales": 42459252
146
+ },
147
+ "Lansing, Michigan": {
148
+ "new_unit_sales": 8194640,
149
+ "new_projected_sales": 49003947
150
+ }
151
+ }
152
+ },
153
+ "task12": {
154
+ "lowest_willingness_plant": "Rockford, Illinois",
155
+ "highest_willingness_plant": "Madison, Wisconsin",
156
+ "lowest_role_in_lowest_plant": [
157
+ "Maintenance Technician",
158
+ 2.47
159
+ ],
160
+ "highest_role_in_highest_plant": [
161
+ "Quality Control/Quality Assurance",
162
+ 3.95
163
+ ],
164
+ "training_details": {
165
+ "lowest_plant_lowest_role": {
166
+ "preferred_length": ">2 days",
167
+ "count_1_2_days": 19,
168
+ "total_cost": 9920
169
+ },
170
+ "highest_plant_highest_role": {
171
+ "preferred_length": "<1 day",
172
+ "count_1_2_days": 21,
173
+ "total_cost": 2016
174
+ }
175
+ }
176
+ },
177
+ "task13": {
178
+ "Rockford, Illinois": 5,
179
+ "Madison, Wisconsin": 6,
180
+ "Davenport, Iowa": 5,
181
+ "Columbus, Ohio": 5,
182
+ "Lansing, Michigan": 5
183
+ },
184
+ "task14": {
185
+ "trained_count": 1099,
186
+ "quality_pcts": {
187
+ "Excellent- comprehensive and very helpful": 14,
188
+ "Good- adequate for most needs": 40,
189
+ "Fair- some gaps or inconsistencies": 36,
190
+ "Poor - insufficient or unhelpful": 9
191
+ }
192
+ }
193
+ }
harfeast_world/tasks.json ADDED
@@ -0,0 +1,383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "task_id": "task_01",
4
+ "task_name": "High-Priority Digital Training Employees",
5
+ "prompt": "I'm trying to get a sense of which HarFeast employees are most ready for the digital training rollout. Can you pull the workforce survey data and identify all employees who are above their role type's median readiness score, willing to pilot new tools, willing to spend >2 days in training with dedicated training time, and above the overall median digital comfort score?\n\nOnce you've identified that group, tell me:\n1. How many \"high-priority\" employees are there, and what % of total employees do they represent?\n2. How many total hours does this group spend weekly on manual entry, searching data, or fixing errors? What % of the company-wide total is that?\n3. Break down the high-priority count by role type.\n\nReport your answer here.",
6
+ "ground_truth": {
7
+ "high_priority_count": 52,
8
+ "high_priority_pct": 2.1,
9
+ "hp_inefficient_hours": 1068.0,
10
+ "hp_inefficient_pct": 2.2,
11
+ "hp_frontline": 19,
12
+ "hp_backoffice": 17,
13
+ "hp_supervisor": 8,
14
+ "hp_management": 8
15
+ },
16
+ "rubric": [
17
+ "States that the number of high-priority employees is 52",
18
+ "States that the percentage of all employees the high-priority employees represent is 2.1%",
19
+ "States that the total hours high-priority employees spend on manual entry, searching data or fixing errors is 1068",
20
+ "States that the percentage of all such hours from high-priority employees is 2.2%",
21
+ "States that the number of high-priority employees in the Front-line role type is 19",
22
+ "States that the number of high-priority employees in the Back-office/Support role type is 17",
23
+ "States that the number of high-priority employees in the Supervisor/Team Lead role type is 8",
24
+ "States that the number of high-priority employees in the Management role type is 8"
25
+ ]
26
+ },
27
+ {
28
+ "task_id": "task_02",
29
+ "task_name": "Adjusted Cost of Instability",
30
+ "prompt": "Calculate the Adjusted Cost of Instability for each site, defined as Abnormal scrap cost/(Actual Scrap % - Normal Scrap %) = adjusted cost of instability. The target scrap rate of HarFeast is the minimum in the range of acceptable scrap rate in the scrap rate report. Just use COGS per ton as your scrap cost for now.\n\nReport your final answers to me in a message. Round values to the nearest dollar.",
31
+ "ground_truth": {
32
+ "Rockford, Illinois": 28559265184,
33
+ "Madison, Wisconsin": 25208027893,
34
+ "Davenport, Iowa": 23644183501,
35
+ "Columbus, Ohio": 25448856089,
36
+ "Lansing, Michigan": 21243014839
37
+ },
38
+ "rubric": [
39
+ "States that the adjusted cost of instability for Rockford, Illinois is $28,559,265,184",
40
+ "States that the adjusted cost of instability for Madison, Wisconsin is $25,208,027,893",
41
+ "States that the adjusted cost of instability for Davenport, Iowa is $23,644,183,501",
42
+ "States that the adjusted cost of instability for Columbus, Ohio is $25,448,856,089",
43
+ "States that the adjusted cost of instability for Lansing, Michigan is $21,243,014,839"
44
+ ]
45
+ },
46
+ {
47
+ "task_id": "task_03",
48
+ "task_name": "Predictive Maintenance Scrap Impact",
49
+ "prompt": "Using HarFeast's equipment data, assess the impact of predictive maintenance on HarFeast's scrap rate. We will pilot predictive maintenance only on equipment a) whose scheduled hours per year are at or above that equipment type's median scheduled hours and b) whose labor hours are at or above its plant's median labor hours. For all equipment qualifying for the pilot, apply a 15% reduction to their scrap rate.\n\nCalculate:\n1. The new overall scrap rate for each product family (as a %)\n2. The total number of scrap units each product family avoids every year\n\nReport rounded to 1 decimal place for rates and nearest whole number for units.",
50
+ "ground_truth": {
51
+ "Canned Vegetables": {
52
+ "new_scrap_rate_pct": 6.1,
53
+ "units_avoided": 68948
54
+ },
55
+ "Condiments": {
56
+ "new_scrap_rate_pct": 6.1,
57
+ "units_avoided": 84913
58
+ },
59
+ "Sauces": {
60
+ "new_scrap_rate_pct": 6.0,
61
+ "units_avoided": 69887
62
+ }
63
+ },
64
+ "rubric": [
65
+ "States that the new overall scrap rate for Canned Vegetables is 6.1%",
66
+ "States that the scrap units Canned Vegetables avoids per year is 68948",
67
+ "States that the new overall scrap rate for Condiments is 6.1%",
68
+ "States that the scrap units Condiments avoids per year is 84913",
69
+ "States that the new overall scrap rate for Sauces is 6.0%",
70
+ "States that the scrap units Sauces avoids per year is 69887"
71
+ ]
72
+ },
73
+ {
74
+ "task_id": "task_04",
75
+ "task_name": "Digital Lever Agreement and OEE Projections",
76
+ "prompt": "1. What is the digital lever that Sarah Jenkins, David Chen, and Mike Russo agree will deliver the fastest and biggest boost to HarFeast's Gross Margin?\n\n2. Assuming HarFeast adopts the chosen digital lever, determine the OEE level in the first full year in each plant location where the annual OEE value exceeds the world-class target. Use the OEE improvement assumptions file for growth rates and start dates.\n\nReport OEE values to 2 decimal places as percentages.",
77
+ "ground_truth": {
78
+ "digital_lever": "IoT Sensors for yield",
79
+ "Rockford, Illinois": {
80
+ "first_year_exceeds": 2027,
81
+ "oee_at_that_year": 0.8591
82
+ },
83
+ "Madison, Wisconsin": {
84
+ "first_year_exceeds": 2029,
85
+ "oee_at_that_year": 0.8661
86
+ },
87
+ "Davenport, Iowa": {
88
+ "first_year_exceeds": 2027,
89
+ "oee_at_that_year": 0.8768
90
+ },
91
+ "Columbus, Ohio": {
92
+ "first_year_exceeds": 2032,
93
+ "oee_at_that_year": 0.8641
94
+ },
95
+ "Lansing, Michigan": {
96
+ "first_year_exceeds": 2031,
97
+ "oee_at_that_year": 0.8528
98
+ }
99
+ },
100
+ "rubric": [
101
+ "States that the digital lever is IoT Sensors for yield",
102
+ "States that the OEE level for Rockford, Illinois in the first year exceeding world-class target is 85.91%",
103
+ "States that the first year Rockford, Illinois exceeds world-class target is 2027",
104
+ "States that the OEE level for Madison, Wisconsin in the first year exceeding world-class target is 86.61%",
105
+ "States that the first year Madison, Wisconsin exceeds world-class target is 2029",
106
+ "States that the OEE level for Davenport, Iowa in the first year exceeding world-class target is 87.68%",
107
+ "States that the first year Davenport, Iowa exceeds world-class target is 2027",
108
+ "States that the OEE level for Columbus, Ohio in the first year exceeding world-class target is 86.41%",
109
+ "States that the first year Columbus, Ohio exceeds world-class target is 2032",
110
+ "States that the OEE level for Lansing, Michigan in the first year exceeding world-class target is 85.28%",
111
+ "States that the first year Lansing, Michigan exceeds world-class target is 2031"
112
+ ]
113
+ },
114
+ {
115
+ "task_id": "task_05",
116
+ "task_name": "Labor Cost Analysis",
117
+ "prompt": "1. Give me the total labor cost for each plant location (Rockford, Illinois, Madison, Wisconsin, Davenport, Iowa only).\n\n2. Give me the efficiency gains for each plant location. West North Central division plant locations only have a 10% annual efficiency gain from labor cost. For other locations, the efficiency gain is 20%. However, the efficiency gain is 5% for non-unionized production supervisors no matter where they are located.\n\n3. Give me the forecasted labor cost increase from union demands, assuming a 5% increase for all union workers.\n\nRound to the nearest dollar.",
118
+ "ground_truth": {
119
+ "Rockford, Illinois": {
120
+ "total_labor_cost": 692058,
121
+ "efficiency_gains": 97277,
122
+ "union_demand_increase": 16631
123
+ },
124
+ "Madison, Wisconsin": {
125
+ "total_labor_cost": 1032866,
126
+ "efficiency_gains": 156478,
127
+ "union_demand_increase": 19974
128
+ },
129
+ "Davenport, Iowa": {
130
+ "total_labor_cost": 919381,
131
+ "efficiency_gains": 78062,
132
+ "union_demand_increase": 16771
133
+ }
134
+ },
135
+ "rubric": [
136
+ "States that the Total Annual Labor Cost for Rockford, Illinois is $692,058",
137
+ "States that the Efficiency Gains for Rockford, Illinois is $97,277",
138
+ "States that the Union Demand Increase for Rockford, Illinois is $16,631",
139
+ "States that the Total Annual Labor Cost for Madison, Wisconsin is $1,032,866",
140
+ "States that the Efficiency Gains for Madison, Wisconsin is $156,478",
141
+ "States that the Union Demand Increase for Madison, Wisconsin is $19,974",
142
+ "States that the Total Annual Labor Cost for Davenport, Iowa is $919,381",
143
+ "States that the Efficiency Gains for Davenport, Iowa is $78,062",
144
+ "States that the Union Demand Increase for Davenport, Iowa is $16,771"
145
+ ]
146
+ },
147
+ {
148
+ "task_id": "task_06",
149
+ "task_name": "Operational Efficiency Analysis",
150
+ "prompt": "Analyze the operational efficiency at HarFeast and assess how many inefficient employee hours each plant is recording on average. Which plants have the most efficient operations and the least efficient operations? How much more efficient are the highest efficiency locations vs the lowest efficiency locations?\n\nAssume the following activities are considered inefficient: (a) manual data entry, (b) searching for data, (c) fixing errors. Use the workforce survey data. Report averages to 1 decimal place.",
151
+ "ground_truth": {
152
+ "avg_by_plant": {
153
+ "Rockford, Illinois": 8.7,
154
+ "Madison, Wisconsin": 8.7,
155
+ "Davenport, Iowa": 8.7,
156
+ "Columbus, Ohio": 37.2,
157
+ "Lansing, Michigan": 37.4
158
+ },
159
+ "most_efficient": [
160
+ "Rockford, Illinois",
161
+ "Madison, Wisconsin",
162
+ "Davenport, Iowa"
163
+ ],
164
+ "least_efficient": "Lansing, Michigan",
165
+ "pct_difference": 330
166
+ },
167
+ "rubric": [
168
+ "States the average inefficient time in Rockford, Illinois is 8.7",
169
+ "States the average inefficient time in Madison, Wisconsin is 8.7",
170
+ "States the average inefficient time in Davenport, Iowa is 8.7",
171
+ "States the average inefficient time in Columbus, Ohio is 37.2",
172
+ "States the average inefficient time in Lansing, Michigan is 37.4",
173
+ "States that Rockford, Illinois is a plant with the lowest average inefficient time",
174
+ "States that Madison, Wisconsin is a plant with the lowest average inefficient time",
175
+ "States that Davenport, Iowa is a plant with the lowest average inefficient time",
176
+ "States that Lansing, Michigan is the plant with the highest average inefficient time",
177
+ "States that the difference between highest and lowest average inefficient time is 330%"
178
+ ]
179
+ },
180
+ {
181
+ "task_id": "task_07",
182
+ "task_name": "Productivity Loss Quantification",
183
+ "prompt": "I want to quantify the average annual productivity loss at a cost level for each employee in each primary role based on the sum of average hours spent doing manual entry, searching data, and fixing errors. Then, I want to calculate the total productivity loss cost HarFeast faces every year, company-wide.\n\nNote that the survey responses represent one week of work. Report your final answer as a message. Round to the nearest dollar.",
184
+ "ground_truth": {
185
+ "avg_loss_by_role": {
186
+ "Production/Manufacturing Operator": 18964,
187
+ "Quality Control/Quality Assurance": 21948,
188
+ "Maintenance Technician": 25004,
189
+ "Production Supervisor/Team Lead": 28288,
190
+ "Supply Chain/Logistics Coordinator": 22701,
191
+ "Demand Planning/Forecasting": 28129,
192
+ "Administrative/Support Staff": 21637,
193
+ "Plant Management": 41416
194
+ },
195
+ "total_annual_loss": 62914681
196
+ },
197
+ "rubric": [
198
+ "States the average annual productivity loss cost of a Production/Manufacturing Operator employee is $18,964",
199
+ "States the average annual productivity loss cost of a Quality Control/Quality Assurance employee is $21,948",
200
+ "States the average annual productivity loss cost of a Maintenance Technician employee is $25,004",
201
+ "States the average annual productivity loss cost of a Production Supervisor/Team Lead employee is $28,288",
202
+ "States the average annual productivity loss cost of a Supply Chain/Logistics Coordinator employee is $22,701",
203
+ "States the average annual productivity loss cost of a Demand Planning/Forecasting employee is $28,129",
204
+ "States the average annual productivity loss cost of a Administrative/Support Staff employee is $21,637",
205
+ "States the average annual productivity loss cost of a Plant Management employee is $41,416",
206
+ "States the total annual productivity loss cost is $62,914,681"
207
+ ]
208
+ },
209
+ {
210
+ "task_id": "task_08",
211
+ "task_name": "High-Priority Equipment Quality Losses",
212
+ "prompt": "Using HarFeast's equipment data and quality losses dataset, consider all canned vegetables assets with a scrap rate > 5% and with unplanned downtime hours above the plant median for canned vegetables as \"high-priority\".\n\n1. For the \"high-priority\" group, calculate the total annual quality-related losses (scrap cost + unplanned failure cost).\n2. What percentage of all canned-vegetable quality losses comes from these high-priority assets?\n\nReport losses rounded to the nearest dollar and percentage to the nearest whole number.",
213
+ "ground_truth": {
214
+ "hp_quality_losses": 1526123,
215
+ "hp_pct_of_cv_losses": 38
216
+ },
217
+ "rubric": [
218
+ "States that the total annual quality-related losses for the high-priority group is $1,526,123",
219
+ "States that the percentage of all canned-vegetable quality losses from high-priority assets is 38%"
220
+ ]
221
+ },
222
+ {
223
+ "task_id": "task_09",
224
+ "task_name": "Labor Variance Analysis",
225
+ "prompt": "Calculate the total labor variance in hours (favorable should be positive) and dollars for the Illinois and Wisconsin plants (Rockford, Illinois and Madison, Wisconsin). A positive variance means Total Actual Hours are less than Total Standard Hours. Use the median wage for All Occupations in the food manufacturing industry from the BLS wage benchmark file to convert from hours to dollars.\n\nAlso give me the straight productivity index (Actual Hours / Standard Hours) for each plant.\n\nRound hours to 2 decimal places, dollars to 2 decimal places, and the index to 2 decimal places.",
226
+ "ground_truth": {
227
+ "Rockford, Illinois": {
228
+ "variance_hours": 18506,
229
+ "variance_dollars": 350503.64,
230
+ "productivity_index": 0.89
231
+ },
232
+ "Madison, Wisconsin": {
233
+ "variance_hours": 13695,
234
+ "variance_dollars": 259383.3,
235
+ "productivity_index": 0.92
236
+ }
237
+ },
238
+ "rubric": [
239
+ "States that the Labor Efficiency Variance (Hours) for Rockford, Illinois is 18506 hours",
240
+ "States that the Labor Cost Variance for Rockford, Illinois is $350503.64",
241
+ "States that the Productivity Index for Rockford, Illinois is 0.89",
242
+ "States that the Labor Efficiency Variance (Hours) for Madison, Wisconsin is 13695 hours",
243
+ "States that the Labor Cost Variance for Madison, Wisconsin is $259383.3",
244
+ "States that the Productivity Index for Madison, Wisconsin is 0.92"
245
+ ]
246
+ },
247
+ {
248
+ "task_id": "task_10",
249
+ "task_name": "Updated Productivity Loss with New Wages",
250
+ "prompt": "The client sent us employee wage data (attached), so we need to update our assumptions. Find the average hourly salary across all employee roles in the attached wage file and use that to calculate the updated annual productivity loss for the entire company.\n\nNote that survey responses represent one week of work. Report the annual productivity loss in thousands (000s) rounded to the nearest thousand. Also state the average hourly wage used.\n\nReport your answer here.",
251
+ "ground_truth": {
252
+ "avg_hourly_wage": 29.3,
253
+ "annual_productivity_loss": 73552000
254
+ },
255
+ "rubric": [
256
+ "States the updated annual productivity loss is $73,552,000",
257
+ "States the average fully-loaded hourly wage is $29.3"
258
+ ]
259
+ },
260
+ {
261
+ "task_id": "task_11",
262
+ "task_name": "Technology Investment Impact",
263
+ "prompt": "Identify the top five technology investments from the Aptean report with the largest positive difference in percentage revenue growth between users and non-users. Include only investments that the report explicitly identifies as either top technology investments to date or top investments planned for 2024.\n\nNext, assume that HarFeast will deploy all five of these top initiatives at every plant location. Apply the cumulative growth impact to each plant's current unit sales and calculate the revised projected sales revenue.\n\nRound unit sales to the nearest whole number and revenue to the nearest dollar.",
264
+ "ground_truth": {
265
+ "top5_technologies": [
266
+ "IoT Sensors",
267
+ "Predictive Maintenance",
268
+ "Robotic Automation",
269
+ "Advanced Analytics",
270
+ "AI Quality Control"
271
+ ],
272
+ "plant_results": {
273
+ "Rockford, Illinois": {
274
+ "new_unit_sales": 25575169,
275
+ "new_projected_sales": 75702500
276
+ },
277
+ "Madison, Wisconsin": {
278
+ "new_unit_sales": 27711624,
279
+ "new_projected_sales": 89508546
280
+ },
281
+ "Davenport, Iowa": {
282
+ "new_unit_sales": 6074125,
283
+ "new_projected_sales": 35715855
284
+ },
285
+ "Columbus, Ohio": {
286
+ "new_unit_sales": 6207493,
287
+ "new_projected_sales": 42459252
288
+ },
289
+ "Lansing, Michigan": {
290
+ "new_unit_sales": 8194640,
291
+ "new_projected_sales": 49003947
292
+ }
293
+ }
294
+ },
295
+ "rubric": [
296
+ "States that the unit sales for Rockford, Illinois after deploying initiatives is 25,575,169",
297
+ "States that the Revised Projected Sales for Rockford, Illinois is $75,702,500",
298
+ "States that the unit sales for Madison, Wisconsin after deploying initiatives is 27,711,624",
299
+ "States that the Revised Projected Sales for Madison, Wisconsin is $89,508,546",
300
+ "States that the unit sales for Davenport, Iowa after deploying initiatives is 6,074,125",
301
+ "States that the Revised Projected Sales for Davenport, Iowa is $35,715,855",
302
+ "States that the unit sales for Columbus, Ohio after deploying initiatives is 6,207,493",
303
+ "States that the Revised Projected Sales for Columbus, Ohio is $42,459,252",
304
+ "States that the unit sales for Lansing, Michigan after deploying initiatives is 8,194,640",
305
+ "States that the Revised Projected Sales for Lansing, Michigan is $49,003,947"
306
+ ]
307
+ },
308
+ {
309
+ "task_id": "task_12",
310
+ "task_name": "Digital Adoption Willingness Analysis",
311
+ "prompt": "To implement the required roadmap, we need to identify what roles and plants are most and least willing to go through a digital transformation.\n\nDetermine the plant with the highest and lowest average willingness to adopt digital tools. Within those plants, identify the roles with the highest and lowest willingness. For those specific role-plant combinations, determine the preferred training length, the count of employees preferring 1-2 days of training, and the total training cost (at $8/hour training rate).\n\nReport your findings here.",
312
+ "ground_truth": {
313
+ "lowest_willingness_plant": "Rockford, Illinois",
314
+ "highest_willingness_plant": "Madison, Wisconsin",
315
+ "lowest_role_in_lowest_plant": [
316
+ "Maintenance Technician",
317
+ 2.47
318
+ ],
319
+ "highest_role_in_highest_plant": [
320
+ "Quality Control/Quality Assurance",
321
+ 3.95
322
+ ],
323
+ "training_details": {
324
+ "lowest_plant_lowest_role": {
325
+ "preferred_length": ">2 days",
326
+ "count_1_2_days": 19,
327
+ "total_cost": 9920
328
+ },
329
+ "highest_plant_highest_role": {
330
+ "preferred_length": "<1 day",
331
+ "count_1_2_days": 21,
332
+ "total_cost": 2016
333
+ }
334
+ }
335
+ },
336
+ "rubric": [
337
+ "States that the plant with lowest willingness to adopt is Rockford, Illinois",
338
+ "States that the plant with highest willingness to adopt is Madison, Wisconsin",
339
+ "States the role with lowest willingness in Rockford, Illinois is Maintenance Technician",
340
+ "States the role with highest willingness in Madison, Wisconsin is Quality Control/Quality Assurance"
341
+ ]
342
+ },
343
+ {
344
+ "task_id": "task_13",
345
+ "task_name": "Frito-Lay Downtime Reduction Application",
346
+ "prompt": "Can you look at the Frito-Lay case study and apply their downtime reduction to HarFeast's numbers in the equipment data? I want to estimate what the improvement would look like for us (rounded to the nearest full percentage point).\n\nCalculate the current unplanned downtime ratio (unplanned downtime hours / scheduled hours) for each plant, apply the reduction from the case study, and report the new ratios.\n\nOutput the information in a message here.",
347
+ "ground_truth": {
348
+ "Rockford, Illinois": 5,
349
+ "Madison, Wisconsin": 6,
350
+ "Davenport, Iowa": 5,
351
+ "Columbus, Ohio": 5,
352
+ "Lansing, Michigan": 5
353
+ },
354
+ "rubric": [
355
+ "States that the new unplanned downtime ratio for Rockford, Illinois is 5%",
356
+ "States that the new unplanned downtime ratio for Madison, Wisconsin is 6%",
357
+ "States that the new unplanned downtime ratio for Davenport, Iowa is 5%",
358
+ "States that the new unplanned downtime ratio for Columbus, Ohio is 5%",
359
+ "States that the new unplanned downtime ratio for Lansing, Michigan is 5%"
360
+ ]
361
+ },
362
+ {
363
+ "task_id": "task_14",
364
+ "task_name": "Training Quality Assessment",
365
+ "prompt": "Use the workforce survey responses to identify the number of respondents who received any kind of training on digital tools. Of those respondents, return the percentage of respondents for each training quality rating.\n\nReply back here to me.",
366
+ "ground_truth": {
367
+ "trained_count": 1099,
368
+ "quality_pcts": {
369
+ "Excellent- comprehensive and very helpful": 14,
370
+ "Good- adequate for most needs": 40,
371
+ "Fair- some gaps or inconsistencies": 36,
372
+ "Poor - insufficient or unhelpful": 9
373
+ }
374
+ },
375
+ "rubric": [
376
+ "States that the number of respondents who received training is 1099",
377
+ "States that percentage of respondents rated training as \"Excellent- comprehensive and very helpful\" is 14%",
378
+ "States that percentage of respondents rated training as \"Good- adequate for most needs\" is 40%",
379
+ "States that percentage of respondents rated training as \"Fair- some gaps or inconsistencies\" is 36%",
380
+ "States that percentage of respondents rated training as \"Poor - insufficient or unhelpful\" is 9%"
381
+ ]
382
+ }
383
+ ]