marginplay / README.md
leitaofilho's picture
docs: add marginplay.app website link
0ba91b5 verified
---
license: mit
tags:
- reinforcement-learning
- multi-agent
- maddpg
- mlx
- simulation
library_name: mlx
---
# Margin Play — Trained Checkpoints
**Website:** [marginplay.app](https://www.marginplay.app/) · **Code:** [github.com/aiacontext/marginplay](https://github.com/aiacontext/marginplay)
Pretrained MADDPG weights for the [Margin Play](https://github.com/aiacontext/marginplay) multi-agent reinforcement learning simulation of oil exploration in the Brazilian Equatorial Margin.
## Contents
This repository holds the **final policy weights** of the `sweep_v6_6sc_10k` training run (10,000 episodes per scenario, 6 scenarios, 6 agents).
### Scenarios (6)
| Key | Description |
|---|---|
| `referencia` | Reference baseline |
| `otimista` | Optimistic price/discovery assumptions |
| `pessimista` | Pessimistic price/discovery assumptions |
| `choque_brent` | Brent oil price shock |
| `ma_prospero` | Prosperous Maranhão variant |
| `sem_lei12858` | Without Law 12,858 (royalty redistribution removed) |
### Agents (6)
`operadora`, `anp`, `ibama`, `gov_federal`, `gov_estadual`, `comunidade`.
### Files
For each scenario × agent there is:
- `{scenario}_actor_{agent}.npz` — Actor (policy) MLP weights
- `{scenario}_critic_{agent}.npz` — Critic (Q-function) MLP weights
Plus per-scenario episode logs:
- `{scenario}_episodes.parquet` — Step-level state/action/reward log
- `sweep_summary.parquet` — Aggregated metrics across the sweep
## Usage
```bash
pip install huggingface-hub
hf download aiacontext/marginplay --local-dir models/
```
Then load with the [Margin Play](https://github.com/aiacontext/marginplay) codebase:
```python
import numpy as np
from agents.networks import Actor
from agents.definitions import SPECS, state_dim_global
weights = np.load("models/referencia_actor_operadora.npz")
actor = Actor(state_dim=state_dim_global(), action_dim=SPECS["operadora"].action_dim)
actor.load_weights(list(weights.items()))
```
With `explore=False` the policy is deterministic — same scenario seed and intervention log produce reproducible trajectories.
## Training
- Algorithm: MADDPG (Multi-Agent DDPG) with target networks and OU exploration noise
- Framework: [MLX](https://github.com/ml-explore/mlx) on Apple Silicon
- Episodes per scenario: 10,000
- See [training scripts](https://github.com/aiacontext/marginplay/tree/main/scripts) for full configuration
## Citation
If you use these weights or the Margin Play simulator in academic work, please cite:
```bibtex
@unpublished{leitaofilho2026marginplay,
title = {Margin Play: A Multi-Agent System for Public Policy Analysis
in the Brazilian Equatorial Margin},
author = {Leit{\~a}o Filho, Antonio de Sousa and
Lima, Fabr{\'\i}cio Saul and
Santos, Selby Mykael Lima dos and
Sousa, Rejani Bandeira Vieira and
Jesus, Lu{\'\i}s Jorge Mesquita de and
Silva, Dennys Correia da and
Barros Filho, Allan Kardec Duailibe},
year = {2026},
note = {Manuscript in preparation},
}
```
Plain text:
> Leitão Filho, A. S., Lima, F. S., Santos, S. M. L. dos, Sousa, R. B. V., Jesus, L. J. M. de, Silva, D. C. da, & Barros Filho, A. K. D. (2026). *Margin Play: A Multi-Agent System for Public Policy Analysis in the Brazilian Equatorial Margin*. Manuscript in preparation.
**Corresponding author:** Antonio de Sousa Leitão Filho — [antonio@aiacontext.com](mailto:antonio@aiacontext.com) — ORCID [0009-0002-1705-3611](https://orcid.org/0009-0002-1705-3611).
## License
MIT.