# Adaptive Project Manager ## Problem Statement Software projects often fail for repeatable reasons: - critical work is discovered late, - teams become overloaded, - priorities change, - deadlines slip, - short-term fixes create long-term issues. Most project tools track tasks and timelines, but they do not help with decision-making under uncertainty. This project introduces `AdaptiveProjectManagerEnv`, an OpenEnv-compatible reinforcement learning environment where the agent acts as a project manager. The environment models: - tasks with dependencies, - employees with different skills, - budget and deadline limits, - workload and burnout, - random project disruptions. The goal is to learn better day-to-day project decisions over time. In this environment, the agent plays the role of project manager and must choose actions repeatedly as the project evolves. This is the core challenge: decisions that look useful today can hurt delivery later. ## Why This Problem Matters Project management requires constant tradeoffs. | Goal | Competing Goal | | --- | --- | | Deliver quickly | Avoid burnout | | Stay in budget | Add enough capacity | | Finish full scope | Protect the critical path | | Solve current issues | Reduce future risk | These decisions are hard because: - future events are uncertain, - effects are delayed, - resources are limited, - improving one metric can hurt another. Traditional project-management systems are mostly descriptive: they show status, but they do not decide what should happen next. This problem asks whether an agent can learn stronger decision policies through repeated interaction with a realistic simulation. ## Why Reinforcement Learning This is a sequential decision problem, not just a prediction problem. The core question is not: > “Will the project be late?” It is: > “Given the current state, what action should we take now?” RL is a good fit because the environment includes: - repeated decisions, - delayed rewards, - changing conditions, - multiple conflicting objectives. The agent is not asked to predict outcomes once. It must repeatedly choose the next best action under uncertainty. ### Why Fixed Rules Are Not Enough A simple rule like “always assign the strongest engineer to the highest-priority task” can fail because: - one reassignment may block another dependency, - overtime may help now but increase burnout later, - dropping minor scope may protect delivery, - short-term budget increases (e.g., contractor) may prevent larger delays. No single rule is optimal in every state. The best action depends on: - what has already happened, - what is likely to happen next, - how current choices change future options. ## Research Question Can an RL agent manage a software project under uncertainty better than fixed rule-based policies? More specifically: can it learn when to assign, delay, reprioritize, de-scope, or escalate work to improve final delivery outcomes? Baseline comparison is against fixed heuristic policies. ## Environment Objective Deliver the project while balancing: - schedule performance, - budget control, - burnout risk, - stakeholder satisfaction, - completed scope. These goals conflict, so there is no perfect action at each step. | Action | Benefit | Cost | | --- | --- | --- | | Request overtime | Faster progress | Higher burnout risk | | Hire contractor | More capacity | Higher spend | | Drop low-priority scope | Better schedule protection | Lower stakeholder satisfaction | | Focus on easy tasks | Quick visible progress | Critical blockers remain | Target behavior is to maximize project success while minimizing delay, overspend, and burnout. ## What the Agent Decides ### 1) Resource Allocation - Who works on which task - When to reassign staff - When to preserve specialist capacity ### 2) Prioritization - Which tasks to do now - Whether to clear blockers first - How to manage critical-path work ### 3) Contingency Actions - Overtime - Temporary hiring - Scope deferral - Delay acceptance These contingency choices can produce delayed effects across later steps. ## OpenEnv Fit This environment is suitable for OpenEnv because it provides: - a realistic human decision task, - clear observation/action/reward structure, - deterministic evaluation through seeded randomness, - support for multiple difficulty settings, - efficient simulation with meaningful complexity. Per simulated day, the loop is: 1. Observe project state 2. Choose actions 3. Simulate one day of progress 4. Apply possible disruptions 5. Return new state and reward This directly matches the standard RL interaction cycle: `Observation -> Action -> Environment Update -> Reward` ## Difficulty and Delayed Effects Short-term gains can create long-term losses. Example: - Day 3: use overtime to recover schedule. - Day 10: burnout reduces team speed; critical work slips. The agent must therefore plan ahead, not optimize only for immediate reward. This introduces long-horizon planning, risk management, and tradeoff control. ## Stochastic Events and Reproducibility The environment includes realistic random events, such as: - employee absence, - urgent feature requests, - effort increases, - vendor delays, - bug discovery, - requirement changes. Randomness is seed-controlled to ensure: - reproducible runs, - fair policy comparison, - stable evaluation. This keeps the environment realistic while still allowing fair and repeatable comparison between policies. ## Why This Problem Is Useful `AdaptiveProjectManagerEnv` can serve as: - a benchmark for long-horizon decision-making, - a controlled setting for comparing project-management strategies, - an early foundation for AI-assisted planning tools. ## Summary `AdaptiveProjectManagerEnv` frames project management as sequential decision-making under uncertainty. It combines limited resources, conflicting goals, delayed consequences, and controlled randomness in a reproducible RL environment. Core question: > Can an agent learn to manage a complex project under uncertainty better than fixed rules?