Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| __pycache__ | 14 items | ||
| server | 9 items | ||
| Dockerfile | 2.62 kB xet | 07b36423 | |
| README.md | 9.53 kB xet | 7c3816ea | |
| __init__.py | 1.95 kB xet | 4431cd08 | |
| client.py | 3.7 kB xet | cb992623 | |
| config.py | 11.5 kB xet | 69d907df | |
| dynamics.py | 18.8 kB xet | 50b72a23 | |
| enums.py | 3.45 kB xet | df68211e | |
| farm_state.py | 19.1 kB xet | f620ca1d | |
| inference.py | 17.1 kB xet | fd95c700 | |
| market_engine.py | 16.2 kB xet | 5e490de6 | |
| models.py | 8.78 kB xet | 9b254e15 | |
| multi_agent_environment.py | 28.2 kB xet | 979b5a98 | |
| openenv.yaml | 249 Bytes xet | 39f820f3 | |
| public_ledger.py | 7.51 kB xet | 508fee6f | |
| pyproject.toml | 1.3 kB xet | 7601b7f5 | |
| tasks.py | 14.5 kB xet | 5af27fd3 | |
| time_controller.py | 4.8 kB xet | 6d26f120 | |
| uv.lock | 576 kB xet | 4ccbdfc6 |
CropRL — Multi-Agent Agricultural Decision-Making Environment
CropRL simulates the core decision-making challenges of a modern farm ecosystem over a 5-year (60-month) horizon. In this multi-agent environment, multiple farmers (agents) operate simultaneously, each managing their own plot of land, finances, and inventory. They choose what to plant, when to harvest, how to manage soil health, and when to take financial risks — all under stochastic weather and fluctuating commodity prices.
Because agents operate in a shared economy, their actions impact one another. If every agent plants the same high-profit crop, market supply will flood at harvest time, crashing the price at the monthly clearing forum.
The goal for each agent is to maximize terminal profit (change in net worth: cash + land value + inventory + growing crops − debt) by the end of the episode, while navigating competition, cooperation, and market dynamics.
Motivation & Real-World Utility
Why CropRL is a Strong RL Benchmark
Farming is one of the oldest sequential decision-making problems. Every month, a farmer faces choices with delayed, uncertain outcomes:
- Do I plant an expensive "hype" crop that could yield massive profit, or a safe one that at least won't bankrupt me?
- Should I sell now at a low price, or store the harvest and gamble on prices rising — knowing it might rot?
- Rain has been poor this year. Do I take a loan to irrigate, adding debt with interest?
- What are my neighbors planting? If they all plant Quinoa, the market will crash. Should I pivot to Chickpea?
These dynamics make CropRL an excellent benchmark for modern Reinforcement Learning algorithms:
- Multi-Agent Market Dynamics: Agents must learn Theory of Mind. They communicate via a public forum, form cartels, coordinate planting schedules, or bluff to mislead competitors.
- Single Unifying Objective: Despite the multi-domain complexity, the agent is trained on a single primary goal: maximizing terminal profit. It must balance short-term cash vs. preserving soil nitrogen, without hand-crafted, multi-objective reward shaping.
- Harsh Real-World Constraints: The environment imposes strict physical and financial bounds. Inflation and compounding interest erode idle cash or loan value, crops have strict physiological development times, and unmitigated weather shocks permanently limit yield potential.
- Delayed Consequences: Actions have incredibly long-tail outcomes. Planting corn today ensures a large harvest in 4 months but destroys the soil's yield capacity for the following year. Taking a loan today applies compounding financial pressure for seasons to come.
The Crops
Agents manage 6 crop types (3 standard, 3 hype crops) that form a strategic balance:
| Crop | Category | Seed Cost | Growth | Base Yield | Soil Impact | Base Price |
|---|---|---|---|---|---|---|
| Corn | Heavy Feeder | ₹800 | 4 months | 8 tons | −0.08 N/mo | ₹1,200/ton |
| Wheat | Medium Feeder | ₹500 | 3 months | 5 tons | −0.07 N/mo | ₹800/ton |
| Chickpea | Legume | ₹200 | 3 months | 3 tons | +0.05 N/mo | ₹500/ton |
| Matcha | Hype Crop | ₹1,500 | 5 months | 2 tons | −0.12 N/mo | ₹2,500/ton |
| Quinoa | Hype Crop | ₹900 | 3 months | 3 tons | −0.06 N/mo | ₹1,800/ton |
| Turmeric | Hype Crop | ₹600 | 4 months | 4 tons | −0.04 N/mo | ₹1,200/ton |
Note: Soil impacts are applied monthly during the growth period.
The core tension: Corn is profitable but destroys the soil. Chickpea restores it but earns less. Hype crops (Matcha, Quinoa, Turmeric) offer massive margins but have expensive seed costs, meaning a bad harvest can bankrupt an agent. A model that learns Corn monoculture will see yields collapse within a few cycles as nitrogen depletes. The optimal policy involves crop rotation and market diversification.
What the Agent Observes
Each month, the agent receives a full dashboard text observation:
- Time & Weather — current month, season, and expected rainfall (0.0–1.0)
- Farm Status — what's planted, crop age, soil nitrogen level (0.0–1.0), water level, expected yield potential
- Finances — cash balance, debt, current interest rate
- Market Prices — current spot prices for all 6 crops
- Storage — what's in the warehouse and how old it is
- Public Ledger — messages posted by other agents in the communication forum
What the Agent Can Do
Each month, the agent picks one of 15 actions:
| ID | Action | Effect |
|---|---|---|
| 0 | Wait / No-Op | Do nothing but consume 1 action slot |
| 1–3 | Plant Corn / Wheat / Chickpea | Spend seed cost, occupy land |
| 4 | Irrigate | Spend ₹300, dynamically boost soil water level depending on the crop |
| 5 | Fertilize | Spend ₹400, boost soil nitrogen by +0.15 |
| 6 | Harvest & Store | Clear land, put harvest in warehouse |
| 7 | Harvest & Sell | Clear land, queue sale for month-end clearing |
| 8 | Sell Inventory | Queue stored crops for month-end sale |
| 9 | Take Loan | Get a lump sum of cash, start accumulating interest |
| 10 | Repay Loan | Pay off full debt if you have enough cash |
| 11 | Post Forum Message | Post a public message to the ledger for all agents to see (Theory of Mind/Coordination) |
| 12–14 | Plant Matcha / Quinoa / Turmeric | Plant a high-risk, high-reward hype crop |
Note: Invalid actions are penalized (−50 reward) and no-op'd.
The Market Clearing Forum
Because multiple agents operate in the same economy, CropRL implements a Clearing Forum at the end of every month.
- Agents queue their crops for sale using actions
7or8. - At the end of the month, the environment tallies the total supply of each crop entering the market.
- If the supply is unusually high, the clearing price for that crop is depressed (slippage). If supply is zero, the price remains at the generated market rate.
- Agents are paid out at the final clearing price, not the spot price they observed during their turn.
This mechanic heavily penalizes homogeneous agent behavior and strictly requires market coordination or contrarian strategies.
Sources of Uncertainty
Four stochastic processes drive the environment. All Gaussian samples are clamped to ±3σ, and all outputs have hard floor/ceiling clamps to prevent runaway values.
| Source | When | Formula | Clamp |
|---|---|---|---|
| Rainfall | Every month | clip(μ(month) + N(0, 0.15²), 0, 1) |
[0, 1] |
| Market Prices | Every month | Mean-reverting random walk with seasonal targets | [base × 0.5, base × 2.5] |
| Yield Noise | At harvest | deterministic_yield × (1 + N(0, 0.10²)) |
[0, ∞) with ±3σ noise clamp |
| Demand Shocks | ~8%/month | One random crop gets ±30-60% price shock | Respects price ceiling |
How Prices Work
Prices follow a mean-reverting random walk — each month's price is anchored to the previous month's, pulled toward the seasonal target:
target = base_price × seasonal_multiplier
drift = 0.3 × (target − P_prev) / target
P_new = P_prev × (1 + drift + noise)
This means price trends emerge (a rising corn market may keep rising), making storage speculation a learnable strategy.
Seasonal Price Multipliers
| Period | Months | Price Multiplier |
|---|---|---|
| Pre-monsoon | Apr–May | 1.15× (scarce supply) |
| Monsoon | Jun–Sep | 1.00× |
| Winter | Jan–Mar | 0.95× |
| Post-harvest | Oct–Dec | 0.95× |
Reward Signal
Each step's reward is calculated as the Delta of Net Worth:
Reward = Δ(Net Worth) + Penalties
Where Net Worth is evaluated as:
Net Worth = Cash + Land Value + Stored Inventory Value + Growing Crop Expected Value − Debt
- Cash delta: Direct revenue minus costs.
- Asset value delta: Changes in the value of the land (driven by soil nitrogen
nitrogen × base_land_price) and growing crops. This implicitly rewards the agent for keeping soil healthy without needing a separate hardcoded "terminal soil bonus". - Invalid action penalty: −50 for attempting unavailable actions.
- Terminal Profit: Computed at the end of the game as
Final Net Worth - Initial Net Worth.
What a Good Policy Learns
A well-trained agent should discover:
- Market Coordination — using the Public Ledger (Action 11) to coordinate planting schedules and avoid market gluts.
- Crop rotation — alternate nitrogen-depleting crops (corn) with legumes (chickpea) to sustain soil and land value.
- Seasonal timing — plant before the monsoon (June) to exploit free water, sell before the post-harvest dip.
- Storage speculation — store harvests during price troughs and sell during spikes.
- Financial prudence — avoid loans unless the expected return exceeds interest costs; repay before interest compounds.
Difficulty Tiers
The environment supports three difficulty presets:
| Parameter | Easy | Medium | Hard |
|---|---|---|---|
| Starting cash | ₹15,000 | ₹10,000 | ₹7,000 |
| Interest rate | 0% | 8% | 12% |
| Weather noise | Standard | Standard | Standard |
| Max steps | 60 | 60 | 60 |
- Total size
- 1.83 GB
- Files
- 880
- Last updated
- Apr 25
- Pre-warmed CDN
- US EU US EU