# Adaptive Project Manager

## Problem Statement

Software projects often fail for repeatable reasons:

- critical work is discovered late,
- teams become overloaded,
- priorities change,
- deadlines slip,
- short-term fixes create long-term issues.

Most project tools track tasks and timelines, but they do not help with decision-making under uncertainty.

This project introduces `AdaptiveProjectManagerEnv`, an OpenEnv-compatible reinforcement learning environment where the agent acts as a project manager.

The environment models:

- tasks with dependencies,
- employees with different skills,
- budget and deadline limits,
- workload and burnout,
- random project disruptions.

The goal is to learn better day-to-day project decisions over time.

In this environment, the agent plays the role of project manager and must choose actions repeatedly as the project evolves.

This is the core challenge: decisions that look useful today can hurt delivery later.

## Why This Problem Matters

Project management requires constant tradeoffs.

| Goal | Competing Goal |
| --- | --- |
| Deliver quickly | Avoid burnout |
| Stay in budget | Add enough capacity |
| Finish full scope | Protect the critical path |
| Solve current issues | Reduce future risk |

These decisions are hard because:

- future events are uncertain,
- effects are delayed,
- resources are limited,
- improving one metric can hurt another.

Traditional project-management systems are mostly descriptive: they show status, but they do not decide what should happen next.

This problem asks whether an agent can learn stronger decision policies through repeated interaction with a realistic simulation.

## Why Reinforcement Learning

This is a sequential decision problem, not just a prediction problem.

The core question is not:

> “Will the project be late?”

It is:

> “Given the current state, what action should we take now?”

RL is a good fit because the environment includes:

- repeated decisions,
- delayed rewards,
- changing conditions,
- multiple conflicting objectives.

The agent is not asked to predict outcomes once. It must repeatedly choose the next best action under uncertainty.

### Why Fixed Rules Are Not Enough

A simple rule like “always assign the strongest engineer to the highest-priority task” can fail because:

- one reassignment may block another dependency,
- overtime may help now but increase burnout later,
- dropping minor scope may protect delivery,
- short-term budget increases (e.g., contractor) may prevent larger delays.

No single rule is optimal in every state.

The best action depends on:

- what has already happened,
- what is likely to happen next,
- how current choices change future options.

## Research Question

Can an RL agent manage a software project under uncertainty better than fixed rule-based policies?

More specifically: can it learn when to assign, delay, reprioritize, de-scope, or escalate work to improve final delivery outcomes?

Baseline comparison is against fixed heuristic policies.

## Environment Objective

Deliver the project while balancing:

- schedule performance,
- budget control,
- burnout risk,
- stakeholder satisfaction,
- completed scope.

These goals conflict, so there is no perfect action at each step.

| Action | Benefit | Cost |
| --- | --- | --- |
| Request overtime | Faster progress | Higher burnout risk |
| Hire contractor | More capacity | Higher spend |
| Drop low-priority scope | Better schedule protection | Lower stakeholder satisfaction |
| Focus on easy tasks | Quick visible progress | Critical blockers remain |

Target behavior is to maximize project success while minimizing delay, overspend, and burnout.

## What the Agent Decides

### 1) Resource Allocation

- Who works on which task
- When to reassign staff
- When to preserve specialist capacity

### 2) Prioritization

- Which tasks to do now
- Whether to clear blockers first
- How to manage critical-path work

### 3) Contingency Actions

- Overtime
- Temporary hiring
- Scope deferral
- Delay acceptance

These contingency choices can produce delayed effects across later steps.

## OpenEnv Fit

This environment is suitable for OpenEnv because it provides:

- a realistic human decision task,
- clear observation/action/reward structure,
- deterministic evaluation through seeded randomness,
- support for multiple difficulty settings,
- efficient simulation with meaningful complexity.

Per simulated day, the loop is:

1. Observe project state
2. Choose actions
3. Simulate one day of progress
4. Apply possible disruptions
5. Return new state and reward

This directly matches the standard RL interaction cycle:

`Observation -> Action -> Environment Update -> Reward`

## Difficulty and Delayed Effects

Short-term gains can create long-term losses.

Example:

- Day 3: use overtime to recover schedule.
- Day 10: burnout reduces team speed; critical work slips.

The agent must therefore plan ahead, not optimize only for immediate reward.

This introduces long-horizon planning, risk management, and tradeoff control.

## Stochastic Events and Reproducibility

The environment includes realistic random events, such as:

- employee absence,
- urgent feature requests,
- effort increases,
- vendor delays,
- bug discovery,
- requirement changes.

Randomness is seed-controlled to ensure:

- reproducible runs,
- fair policy comparison,
- stable evaluation.

This keeps the environment realistic while still allowing fair and repeatable comparison between policies.

## Why This Problem Is Useful

`AdaptiveProjectManagerEnv` can serve as:

- a benchmark for long-horizon decision-making,
- a controlled setting for comparing project-management strategies,
- an early foundation for AI-assisted planning tools.

## Summary

`AdaptiveProjectManagerEnv` frames project management as sequential decision-making under uncertainty.

It combines limited resources, conflicting goals, delayed consequences, and controlled randomness in a reproducible RL environment.

Core question:

> Can an agent learn to manage a complex project under uncertainty better than fixed rules?