--- license: mit tags: - reinforcement-learning - multi-agent - maddpg - mlx - simulation library_name: mlx --- # Margin Play — Trained Checkpoints **Website:** [marginplay.app](https://www.marginplay.app/) · **Code:** [github.com/aiacontext/marginplay](https://github.com/aiacontext/marginplay) Pretrained MADDPG weights for the [Margin Play](https://github.com/aiacontext/marginplay) multi-agent reinforcement learning simulation of oil exploration in the Brazilian Equatorial Margin. ## Contents This repository holds the **final policy weights** of the `sweep_v6_6sc_10k` training run (10,000 episodes per scenario, 6 scenarios, 6 agents). ### Scenarios (6) | Key | Description | |---|---| | `referencia` | Reference baseline | | `otimista` | Optimistic price/discovery assumptions | | `pessimista` | Pessimistic price/discovery assumptions | | `choque_brent` | Brent oil price shock | | `ma_prospero` | Prosperous Maranhão variant | | `sem_lei12858` | Without Law 12,858 (royalty redistribution removed) | ### Agents (6) `operadora`, `anp`, `ibama`, `gov_federal`, `gov_estadual`, `comunidade`. ### Files For each scenario × agent there is: - `{scenario}_actor_{agent}.npz` — Actor (policy) MLP weights - `{scenario}_critic_{agent}.npz` — Critic (Q-function) MLP weights Plus per-scenario episode logs: - `{scenario}_episodes.parquet` — Step-level state/action/reward log - `sweep_summary.parquet` — Aggregated metrics across the sweep ## Usage ```bash pip install huggingface-hub hf download aiacontext/marginplay --local-dir models/ ``` Then load with the [Margin Play](https://github.com/aiacontext/marginplay) codebase: ```python import numpy as np from agents.networks import Actor from agents.definitions import SPECS, state_dim_global weights = np.load("models/referencia_actor_operadora.npz") actor = Actor(state_dim=state_dim_global(), action_dim=SPECS["operadora"].action_dim) actor.load_weights(list(weights.items())) ``` With `explore=False` the policy is deterministic — same scenario seed and intervention log produce reproducible trajectories. ## Training - Algorithm: MADDPG (Multi-Agent DDPG) with target networks and OU exploration noise - Framework: [MLX](https://github.com/ml-explore/mlx) on Apple Silicon - Episodes per scenario: 10,000 - See [training scripts](https://github.com/aiacontext/marginplay/tree/main/scripts) for full configuration ## Citation If you use these weights or the Margin Play simulator in academic work, please cite: ```bibtex @unpublished{leitaofilho2026marginplay, title = {Margin Play: A Multi-Agent System for Public Policy Analysis in the Brazilian Equatorial Margin}, author = {Leit{\~a}o Filho, Antonio de Sousa and Lima, Fabr{\'\i}cio Saul and Santos, Selby Mykael Lima dos and Sousa, Rejani Bandeira Vieira and Jesus, Lu{\'\i}s Jorge Mesquita de and Silva, Dennys Correia da and Barros Filho, Allan Kardec Duailibe}, year = {2026}, note = {Manuscript in preparation}, } ``` Plain text: > Leitão Filho, A. S., Lima, F. S., Santos, S. M. L. dos, Sousa, R. B. V., Jesus, L. J. M. de, Silva, D. C. da, & Barros Filho, A. K. D. (2026). *Margin Play: A Multi-Agent System for Public Policy Analysis in the Brazilian Equatorial Margin*. Manuscript in preparation. **Corresponding author:** Antonio de Sousa Leitão Filho — [antonio@aiacontext.com](mailto:antonio@aiacontext.com) — ORCID [0009-0002-1705-3611](https://orcid.org/0009-0002-1705-3611). ## License MIT.