---
license: mit
tags:
  - reinforcement-learning
  - multi-agent
  - maddpg
  - mlx
  - simulation
library_name: mlx
---

# Margin Play — Trained Checkpoints

**Website:** [marginplay.app](https://www.marginplay.app/) · **Code:** [github.com/aiacontext/marginplay](https://github.com/aiacontext/marginplay)

Pretrained MADDPG weights for the [Margin Play](https://github.com/aiacontext/marginplay) multi-agent reinforcement learning simulation of oil exploration in the Brazilian Equatorial Margin.

## Contents

This repository holds the **final policy weights** of the `sweep_v6_6sc_10k` training run (10,000 episodes per scenario, 6 scenarios, 6 agents).

### Scenarios (6)

| Key | Description |
|---|---|
| `referencia` | Reference baseline |
| `otimista` | Optimistic price/discovery assumptions |
| `pessimista` | Pessimistic price/discovery assumptions |
| `choque_brent` | Brent oil price shock |
| `ma_prospero` | Prosperous Maranhão variant |
| `sem_lei12858` | Without Law 12,858 (royalty redistribution removed) |

### Agents (6)

`operadora`, `anp`, `ibama`, `gov_federal`, `gov_estadual`, `comunidade`.

### Files

For each scenario × agent there is:

- `{scenario}_actor_{agent}.npz` — Actor (policy) MLP weights
- `{scenario}_critic_{agent}.npz` — Critic (Q-function) MLP weights

Plus per-scenario episode logs:

- `{scenario}_episodes.parquet` — Step-level state/action/reward log
- `sweep_summary.parquet` — Aggregated metrics across the sweep

## Usage

```bash
pip install huggingface-hub
hf download aiacontext/marginplay --local-dir models/
```

Then load with the [Margin Play](https://github.com/aiacontext/marginplay) codebase:

```python
import numpy as np
from agents.networks import Actor
from agents.definitions import SPECS, state_dim_global

weights = np.load("models/referencia_actor_operadora.npz")
actor = Actor(state_dim=state_dim_global(), action_dim=SPECS["operadora"].action_dim)
actor.load_weights(list(weights.items()))
```

With `explore=False` the policy is deterministic — same scenario seed and intervention log produce reproducible trajectories.

## Training

- Algorithm: MADDPG (Multi-Agent DDPG) with target networks and OU exploration noise
- Framework: [MLX](https://github.com/ml-explore/mlx) on Apple Silicon
- Episodes per scenario: 10,000
- See [training scripts](https://github.com/aiacontext/marginplay/tree/main/scripts) for full configuration

## Citation

If you use these weights or the Margin Play simulator in academic work, please cite:

```bibtex
@unpublished{leitaofilho2026marginplay,
  title  = {Margin Play: A Multi-Agent System for Public Policy Analysis
            in the Brazilian Equatorial Margin},
  author = {Leit{\~a}o Filho, Antonio de Sousa and
            Lima, Fabr{\'\i}cio Saul and
            Santos, Selby Mykael Lima dos and
            Sousa, Rejani Bandeira Vieira and
            Jesus, Lu{\'\i}s Jorge Mesquita de and
            Silva, Dennys Correia da and
            Barros Filho, Allan Kardec Duailibe},
  year   = {2026},
  note   = {Manuscript in preparation},
}
```

Plain text:

> Leitão Filho, A. S., Lima, F. S., Santos, S. M. L. dos, Sousa, R. B. V., Jesus, L. J. M. de, Silva, D. C. da, & Barros Filho, A. K. D. (2026). *Margin Play: A Multi-Agent System for Public Policy Analysis in the Brazilian Equatorial Margin*. Manuscript in preparation.

**Corresponding author:** Antonio de Sousa Leitão Filho — [antonio@aiacontext.com](mailto:antonio@aiacontext.com) — ORCID [0009-0002-1705-3611](https://orcid.org/0009-0002-1705-3611).

## License

MIT.