Margin Play — Trained Checkpoints

Website: marginplay.app · Code: github.com/aiacontext/marginplay

Pretrained MADDPG weights for the Margin Play multi-agent reinforcement learning simulation of oil exploration in the Brazilian Equatorial Margin.

This repository holds the final policy weights of the sweep_v6_6sc_10k training run (10,000 episodes per scenario, 6 scenarios, 6 agents).

Scenarios (6)

Key	Description
`referencia`	Reference baseline
`otimista`	Optimistic price/discovery assumptions
`pessimista`	Pessimistic price/discovery assumptions
`choque_brent`	Brent oil price shock
`ma_prospero`	Prosperous Maranhão variant
`sem_lei12858`	Without Law 12,858 (royalty redistribution removed)

Agents (6)

operadora, anp, ibama, gov_federal, gov_estadual, comunidade.

Files

For each scenario × agent there is:

{scenario}_actor_{agent}.npz — Actor (policy) MLP weights
{scenario}_critic_{agent}.npz — Critic (Q-function) MLP weights

Plus per-scenario episode logs:

{scenario}_episodes.parquet — Step-level state/action/reward log
sweep_summary.parquet — Aggregated metrics across the sweep

Usage

pip install huggingface-hub
hf download aiacontext/marginplay --local-dir models/

Then load with the Margin Play codebase:

import numpy as np
from agents.networks import Actor
from agents.definitions import SPECS, state_dim_global

weights = np.load("models/referencia_actor_operadora.npz")
actor = Actor(state_dim=state_dim_global(), action_dim=SPECS["operadora"].action_dim)
actor.load_weights(list(weights.items()))

With explore=False the policy is deterministic — same scenario seed and intervention log produce reproducible trajectories.

Training

Algorithm: MADDPG (Multi-Agent DDPG) with target networks and OU exploration noise
Framework: MLX on Apple Silicon
Episodes per scenario: 10,000
See training scripts for full configuration

Citation

If you use these weights or the Margin Play simulator in academic work, please cite:

@unpublished{leitaofilho2026marginplay,
  title  = {Margin Play: A Multi-Agent System for Public Policy Analysis
            in the Brazilian Equatorial Margin},
  author = {Leit{\~a}o Filho, Antonio de Sousa and
            Lima, Fabr{\'\i}cio Saul and
            Santos, Selby Mykael Lima dos and
            Sousa, Rejani Bandeira Vieira and
            Jesus, Lu{\'\i}s Jorge Mesquita de and
            Silva, Dennys Correia da and
            Barros Filho, Allan Kardec Duailibe},
  year   = {2026},
  note   = {Manuscript in preparation},
}

Plain text:

Leitão Filho, A. S., Lima, F. S., Santos, S. M. L. dos, Sousa, R. B. V., Jesus, L. J. M. de, Silva, D. C. da, & Barros Filho, A. K. D. (2026). Margin Play: A Multi-Agent System for Public Policy Analysis in the Brazilian Equatorial Margin. Manuscript in preparation.

Corresponding author: Antonio de Sousa Leitão Filho — antonio@aiacontext.com — ORCID 0009-0002-1705-3611.

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

aiacontext
/

marginplay