File size: 5,286 Bytes
89885cd
 
 
 
 
 
 
 
8b8f8a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa163ea
8b8f8a5
 
 
 
 
 
 
 
 
 
 
fa163ea
 
8b8f8a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: Productivity Copilot Env
emoji: πŸš€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# Productivity Copilot β€” OpenEnv Environment

An AI agent simulation environment where an LLM agent acts as a **productivity coach** managing a virtual human worker. The agent must observe behaviour signals and take corrective actions to prevent task failure β€” powered by real trained machine learning models.

---

## Environment Description & Motivation

Modern knowledge workers face productivity challenges driven by distraction, stress, and poor time management. Instead of a toy environment, this simulation models **real-world task management** where an AI agent must intervene intelligently.

The agent is given a virtual human with observable state signals (stress level, distraction score, focus score, deadline pressure) and must apply targeted interventions. The simulation is grounded in real ML models trained on productivity behaviour data.

---

## Observation Space

Each observation is a `ProductivityObservation` Pydantic model:

| Field | Type | Description |
|---|---|---|
| `current_task` | str | The task the virtual human is working on |
| `deadline_days_remaining` | float | Days left until the task deadline |
| `stress_level` | float (0–10) | Current stress level |
| `motivation_level` | float (0–10) | Current motivation level |
| `distraction_events` | int | Count of distraction interruptions |
| `focus_score` | float (0–1) | Computed by the distraction scorer ML model |
| `failure_probability` | float (0–1) | Computed by the failure predictor ML model |
| `session_duration_minutes` | int | Minutes since last reset/break |
| `break_count` | int | Number of breaks taken |
| `social_media_minutes` | int | Minutes of social media use |
| `time_of_day_hour` | float | Current simulated hour of the day |

---

## Action Space

Each action is a `ProductivityAction` Pydantic model:

| `action_type` | Effect on Environment |
|---|---|
| `WAIT` | Time passes; stressed workers get worse |
| `SEND_NUDGE` | +2 motivation, -0.5 stress, -1 distraction |
| `FORCE_BREAK` | +1 break, session resets, -2 stress, +5 social media |
| `BLOCK_SOCIAL_MEDIA` | Social media set to 0, -3 distractions, +1 stress |

---

## Task Descriptions

### Task 1 β€” Triage (Easy)
A high-stress worker with a looming 1-day deadline. They have accumulated 10 distraction events and low motivation. The agent must identify the right intervention to lower failure probability in a single episode.
- **Objective:** Finish with `failure_probability < 0.5`

### Task 2 β€” Schedule Optimisation (Medium)
A "turtle" work-style employee (slow and steady) with only 0.5 days left on a complex task. The challenge is preventing failure without pushing stress above 8.
- **Objective:** Lower failure probability while keeping `stress_level < 8`

### Task 3 β€” Distraction Mitigation (Hard)
A "hare" worker who binge-works but is caught in extreme distraction (20 events). The agent must maintain `focus_score < 0.5` over the full 10-step episode despite the environment constantly generating more distractions.
- **Objective:** Keep average `focus_score < 0.5` across all steps

---

## Setup & Usage

### Local setup
```bash
# Create a virtual environment and install dependencies
pip install uv
uv sync

# Run openenv validate to confirm environment compliance
openenv validate
```

### Run the baseline agent
```bash
# Set your API credentials
export HF_TOKEN=your_api_key_here
export PRODUCTIVITY_TASK=triage  # or: schedule_optimization, distraction_mitigation

# Run the inference script
python inference.py
```

### Docker
```bash
docker build -t productivity-copilot-env .
docker run -p 7860:7860 -e HF_TOKEN=your_key productivity-copilot-env
```

---

## Baseline Scores

The baseline agent uses `Qwen/Qwen2.5-72B-Instruct` via the HuggingFace Router API.

| Task | Score | Notes |
|---|---|---|
| Task 1 β€” Triage | ~0.60 | Agent correctly prioritises SEND_NUDGE |
| Task 2 β€” Schedule Optimisation | ~0.45 | Agent struggles with stress constraints |
| Task 3 β€” Distraction Mitigation | ~0.35 | Hard task; distractions accumulate quickly |

---

## Environment Architecture

```
Productivity_Copilot/
β”œβ”€β”€ productivity_env/       # Core OpenEnv environment package
β”‚   β”œβ”€β”€ env.py              # ProductivityEnv class (step, reset)
β”‚   β”œβ”€β”€ models.py           # Pydantic Observation & Action models
β”‚   └── __init__.py
β”œβ”€β”€ data_pipeline/          # ML model loading + inference helpers
β”‚   └── inference.py        # CopilotModels singleton (loads .pkl files)
β”œβ”€β”€ model_artifacts/        # Trained .pkl model files
β”‚   β”œβ”€β”€ failure_predictor.pkl
β”‚   β”œβ”€β”€ distraction_scorer.pkl
β”‚   └── work_style_classifier.pkl
β”œβ”€β”€ vectorstore/            # ChromaDB RAG coaching knowledge base
β”œβ”€β”€ server/
β”‚   └── app.py              # FastAPI server for HF Space
β”œβ”€β”€ inference.py            # Baseline agent evaluation script
β”œβ”€β”€ openenv.yaml            # OpenEnv metadata manifest
β”œβ”€β”€ pyproject.toml          # Python project config
β”œβ”€β”€ uv.lock                 # Locked dependencies
└── Dockerfile              # HuggingFace Space container
```