Muqeeth commited on
Commit
7d09048
·
verified ·
1 Parent(s): 5f5c89f

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .hydra/hydra.yaml +154 -0
  2. .hydra/overrides.yaml +1 -0
  3. seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/README.md +207 -0
  4. seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/agent_adapter/adapter_config.json +42 -0
  5. seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/critic_adapter/adapter_config.json +42 -0
  6. seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/fixed_ad_align_adapter/adapter_config.json +42 -0
  7. src_code_for_reproducibility/__init__.py +0 -0
  8. src_code_for_reproducibility/docs/source/conf.py +48 -0
  9. src_code_for_reproducibility/docs/source/index.rst +22 -0
  10. src_code_for_reproducibility/docs/source/installation.rst +10 -0
  11. src_code_for_reproducibility/docs/source/marl_standard.rst +141 -0
  12. src_code_for_reproducibility/docs/source/modules.rst +7 -0
  13. src_code_for_reproducibility/docs/source/src.environments.dond.dond_agent.rst +7 -0
  14. src_code_for_reproducibility/docs/source/src.environments.dond.dond_game.rst +7 -0
  15. src_code_for_reproducibility/docs/source/src.environments.dond.dond_player.rst +7 -0
  16. src_code_for_reproducibility/docs/source/src.environments.dond.dond_statistics_funcs.rst +7 -0
  17. src_code_for_reproducibility/docs/source/src.environments.dond.dond_training_data_funcs.rst +7 -0
  18. src_code_for_reproducibility/docs/source/src.environments.env_imports.rst +7 -0
  19. src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_game.rst +7 -0
  20. src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_log_funcs.rst +7 -0
  21. src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_statistics_funcs.rst +7 -0
  22. src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_training_data_funcs.rst +7 -0
  23. src_code_for_reproducibility/docs/source/src.environments.rst +25 -0
  24. src_code_for_reproducibility/docs/source/src.experiments.arithmetic_test.rst +7 -0
  25. src_code_for_reproducibility/docs/source/src.rst +28 -0
  26. src_code_for_reproducibility/docs/source/src.utils.rst +24 -0
  27. src_code_for_reproducibility/markov_games/__pycache__/agent.cpython-312.pyc +0 -0
  28. src_code_for_reproducibility/markov_games/__pycache__/run_markov_games.cpython-312.pyc +0 -0
  29. src_code_for_reproducibility/markov_games/__pycache__/simulation.cpython-312.pyc +0 -0
  30. src_code_for_reproducibility/markov_games/diplomacy/diplomacy_env.py +230 -0
  31. src_code_for_reproducibility/markov_games/diplomacy/diplomacy_logging.py +360 -0
  32. src_code_for_reproducibility/markov_games/diplomacy/diplomacy_logging_for_training.py +0 -0
  33. src_code_for_reproducibility/markov_games/ipd/Ipd_hard_coded_agents.py +72 -0
  34. src_code_for_reproducibility/markov_games/ipd/__init__.py +7 -0
  35. src_code_for_reproducibility/markov_games/ipd/__pycache__/Ipd_hard_coded_agents.cpython-312.pyc +0 -0
  36. src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_agent.cpython-312.pyc +0 -0
  37. src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_simulation.cpython-312.pyc +0 -0
  38. src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_statistics.cpython-312.pyc +0 -0
  39. src_code_for_reproducibility/markov_games/ipd/ipd_agent.py +115 -0
  40. src_code_for_reproducibility/markov_games/ipd/ipd_simulation.py +162 -0
  41. src_code_for_reproducibility/markov_games/ipd/ipd_statistics.py +18 -0
  42. src_code_for_reproducibility/markov_games/negotiation/README.md +40 -0
  43. src_code_for_reproducibility/markov_games/negotiation/__pycache__/dond_agent.cpython-312.pyc +0 -0
  44. src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_agent.cpython-312.pyc +0 -0
  45. src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_hard_coded_policies.cpython-312.pyc +0 -0
  46. src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_simulation.cpython-312.pyc +0 -0
  47. src_code_for_reproducibility/markov_games/negotiation/__pycache__/no_press_nego_agent.cpython-312.pyc +0 -0
  48. src_code_for_reproducibility/markov_games/negotiation/__pycache__/no_press_nego_simulation.cpython-312.pyc +0 -0
  49. src_code_for_reproducibility/markov_games/negotiation/__pycache__/tas_agent.cpython-312.pyc +0 -0
  50. src_code_for_reproducibility/markov_games/negotiation/__pycache__/tas_rps_agent.cpython-312.pyc +0 -0
.hydra/hydra.yaml ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ hydra:
2
+ run:
3
+ dir: ${oc.env:SCRATCH}/llm_negotiation/${now:%Y_%m}/${experiment.name}
4
+ sweep:
5
+ dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S}
6
+ subdir: ${hydra.job.num}
7
+ launcher:
8
+ _target_: hydra._internal.core_plugins.basic_launcher.BasicLauncher
9
+ sweeper:
10
+ _target_: hydra._internal.core_plugins.basic_sweeper.BasicSweeper
11
+ max_batch_size: null
12
+ params: null
13
+ help:
14
+ app_name: ${hydra.job.name}
15
+ header: '${hydra.help.app_name} is powered by Hydra.
16
+
17
+ '
18
+ footer: 'Powered by Hydra (https://hydra.cc)
19
+
20
+ Use --hydra-help to view Hydra specific help
21
+
22
+ '
23
+ template: '${hydra.help.header}
24
+
25
+ == Configuration groups ==
26
+
27
+ Compose your configuration from those groups (group=option)
28
+
29
+
30
+ $APP_CONFIG_GROUPS
31
+
32
+
33
+ == Config ==
34
+
35
+ Override anything in the config (foo.bar=value)
36
+
37
+
38
+ $CONFIG
39
+
40
+
41
+ ${hydra.help.footer}
42
+
43
+ '
44
+ hydra_help:
45
+ template: 'Hydra (${hydra.runtime.version})
46
+
47
+ See https://hydra.cc for more info.
48
+
49
+
50
+ == Flags ==
51
+
52
+ $FLAGS_HELP
53
+
54
+
55
+ == Configuration groups ==
56
+
57
+ Compose your configuration from those groups (For example, append hydra/job_logging=disabled
58
+ to command line)
59
+
60
+
61
+ $HYDRA_CONFIG_GROUPS
62
+
63
+
64
+ Use ''--cfg hydra'' to Show the Hydra config.
65
+
66
+ '
67
+ hydra_help: ???
68
+ hydra_logging:
69
+ version: 1
70
+ formatters:
71
+ simple:
72
+ format: '[%(asctime)s][HYDRA] %(message)s'
73
+ handlers:
74
+ console:
75
+ class: logging.StreamHandler
76
+ formatter: simple
77
+ stream: ext://sys.stdout
78
+ root:
79
+ level: INFO
80
+ handlers:
81
+ - console
82
+ loggers:
83
+ logging_example:
84
+ level: DEBUG
85
+ disable_existing_loggers: false
86
+ job_logging:
87
+ version: 1
88
+ formatters:
89
+ simple:
90
+ format: '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'
91
+ handlers:
92
+ console:
93
+ class: logging.StreamHandler
94
+ formatter: simple
95
+ stream: ext://sys.stdout
96
+ file:
97
+ class: logging.FileHandler
98
+ formatter: simple
99
+ filename: ${hydra.runtime.output_dir}/${hydra.job.name}.log
100
+ root:
101
+ level: INFO
102
+ handlers:
103
+ - console
104
+ - file
105
+ disable_existing_loggers: false
106
+ env: {}
107
+ mode: RUN
108
+ searchpath: []
109
+ callbacks: {}
110
+ output_subdir: .hydra
111
+ overrides:
112
+ hydra:
113
+ - hydra.mode=RUN
114
+ task: []
115
+ job:
116
+ name: run
117
+ chdir: false
118
+ override_dirname: ''
119
+ id: ???
120
+ num: ???
121
+ config_name: naive_vs_fixed_ad_align_seed0.yaml
122
+ env_set: {}
123
+ env_copy: []
124
+ config:
125
+ override_dirname:
126
+ kv_sep: '='
127
+ item_sep: ','
128
+ exclude_keys: []
129
+ runtime:
130
+ version: 1.3.2
131
+ version_base: '1.1'
132
+ cwd: /scratch/muqeeth/llm_negotiation
133
+ config_sources:
134
+ - path: hydra.conf
135
+ schema: pkg
136
+ provider: hydra
137
+ - path: /scratch/muqeeth/llm_negotiation/configs
138
+ schema: file
139
+ provider: main
140
+ - path: ''
141
+ schema: structured
142
+ provider: schema
143
+ output_dir: /scratch/muqeeth/llm_negotiation/2025_11/naive_vs_fixed_ad_align_seed0
144
+ choices:
145
+ hydra/env: default
146
+ hydra/callbacks: null
147
+ hydra/job_logging: default
148
+ hydra/hydra_logging: default
149
+ hydra/hydra_help: default
150
+ hydra/help: default
151
+ hydra/sweeper: basic
152
+ hydra/launcher: basic
153
+ hydra/output: default
154
+ verbose: false
.hydra/overrides.yaml ADDED
@@ -0,0 +1 @@
 
 
1
+ []
seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-7B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-7B-Instruct
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/agent_adapter/adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "v_proj",
32
+ "gate_proj",
33
+ "q_proj",
34
+ "o_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/critic_adapter/adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "v_proj",
32
+ "gate_proj",
33
+ "q_proj",
34
+ "o_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
seed_0/Qwen/Qwen2.5-7B-Instruct/adapters/fixed_ad_align_adapter/adapter_config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-7B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 64,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.0,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "down_proj",
29
+ "k_proj",
30
+ "up_proj",
31
+ "v_proj",
32
+ "gate_proj",
33
+ "q_proj",
34
+ "o_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false
42
+ }
src_code_for_reproducibility/__init__.py ADDED
File without changes
src_code_for_reproducibility/docs/source/conf.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Configuration file for the Sphinx documentation builder.
2
+ import os
3
+ import sys
4
+ sys.path.insert(0, os.path.abspath('../..'))
5
+
6
+ # -- Project information -----------------------------------------------------
7
+ project = 'llm_negotiation'
8
+ copyright = '2023, Your Name'
9
+ author = 'Your Name'
10
+
11
+ # -- General configuration ---------------------------------------------------
12
+ extensions = [
13
+ 'sphinx.ext.autodoc',
14
+ 'sphinx.ext.viewcode',
15
+ 'sphinx.ext.napoleon',
16
+ 'sphinx.ext.autosummary',
17
+ 'sphinx.ext.intersphinx',
18
+ 'sphinx.ext.mathjax',
19
+ 'sphinxcontrib.mermaid',
20
+ 'sphinx_rtd_theme',
21
+ ]
22
+
23
+ templates_path = ['_templates']
24
+ exclude_patterns = []
25
+
26
+ # -- Options for HTML output -------------------------------------------------
27
+ html_theme = 'sphinx_rtd_theme'
28
+ html_static_path = ['_static']
29
+
30
+ # -- Napoleon settings -------------------------------------------------------
31
+ napoleon_google_docstring = True
32
+ napoleon_numpy_docstring = False
33
+ napoleon_include_init_with_doc = True
34
+ napoleon_include_private_with_doc = False
35
+ napoleon_include_special_with_doc = True
36
+ napoleon_use_admonition_for_examples = False
37
+ napoleon_use_admonition_for_notes = False
38
+ napoleon_use_admonition_for_references = False
39
+ napoleon_use_ivar = False
40
+ napoleon_use_param = True
41
+ napoleon_use_rtype = True
42
+ napoleon_preprocess_types = False
43
+ napoleon_type_aliases = None
44
+ napoleon_attr_annotations = True
45
+
46
+ # -- Path setup --------------------------------------------------------------
47
+ # Make sure the project's modules can be found by Sphinx
48
+ sys.path.insert(0, os.path.abspath('../../src'))
src_code_for_reproducibility/docs/source/index.rst ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Welcome to LLM Negotiation's documentation!
2
+ ===========================================
3
+ This library is a collection of tools for training and evaluating LLM-based agents in multi-agent environments. It is designed to be easy to use and extend.
4
+
5
+ .. toctree::
6
+ :maxdepth: 3
7
+ :caption: Contents:
8
+
9
+ installation
10
+ marl_standard
11
+ environments
12
+ launch
13
+ usage
14
+ modules
15
+ contributing
16
+
17
+ Indices and tables
18
+ ==================
19
+
20
+ * :ref:`genindex`
21
+ * :ref:`modindex`
22
+ * :ref:`search`
src_code_for_reproducibility/docs/source/installation.rst ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Installation
2
+ ===========
3
+
4
+ To install the package, run:
5
+
6
+ .. code-block:: bash
7
+
8
+ git clone https://github.com/yourusername/llm_negotiation.git
9
+ cd llm_negotiation
10
+ pip install -e .
src_code_for_reproducibility/docs/source/marl_standard.rst ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================
2
+ Abstract Standard for Multi-Agent Negotiation Environments
3
+ =================
4
+
5
+ Multi-Agent Negotiation Environments require more features than gymnasium environments in order to be used as interfaces in general game running code.
6
+ The two fundamental differences between gymnasium environments and Multi-Agent Negotiation Environments are:
7
+
8
+ 1. Response from the LLM is a text action, not a discrete action. Therefore, appropriate parsing of the text is required. The model may need to be run multiple times to get the full action.
9
+ This is why we introduce the `AgentHandler` class, which is responsible for parsing the LLM's response.
10
+ 2. The environment needs to be able to handle multi-agent interactions.
11
+ This is why we introduce the `NegotiationEnvironment` class, which is responsible for handling the multi-agent interactions.
12
+ 3. MARL environments are complex to describe. In different contexts, the same environment may be described differently. Therefore, both the environement and the agent handlers are
13
+ responsible for describing a particular trajectory. This information is given by the `get_log_info` method.
14
+ 4. There might be a lot of overlap between the neural networks used by each agent. For instance, the same model may be used for all agents. This motivates a requirement for a
15
+ policy identifier for each agent.
16
+
17
+ Taking inspiration from the `gymnasium <https://gymnasium.farama.org/>`_ library, we introduce a new standard for Multi-Agent Negotiation Environments.
18
+
19
+ Our standard is based on the following features:
20
+
21
+ Environments are of the form:
22
+
23
+ .. code-block:: python
24
+
25
+ class MarlEnvironment():
26
+
27
+ def __init__(self):
28
+ """Initialize the environment."""
29
+ pass
30
+
31
+ def reset(self):
32
+ """Reset the environment to an initial state and return the initial observation.
33
+ Returns:
34
+ observation (dict): A dictionary where keys are agent identifiers and values are observations.
35
+ """
36
+ # (...)
37
+ return observation
38
+
39
+ def step(self, actions):
40
+ """Take a step in the environment using the provided actions.
41
+
42
+ Args:
43
+ actions (dict): A dictionary where keys are agent identifiers and values are actions.
44
+
45
+ Returns:
46
+ observations (dict): A dictionary where keys are agent identifiers and values are observations.
47
+ reward (dict): A dictionary where keys are agent identifiers and values are rewards.
48
+ done (bool): Whether the episode has ended.
49
+ info (dict): Additional information about the environment.
50
+ """
51
+ # (...)
52
+ return observations, done, info
53
+
54
+ def get_log_info(self):
55
+ """Get additional information about the environment. This information is used to log the game.
56
+ Returns:
57
+ log_info (dict): Information about the environment required to log the game.
58
+ """
59
+ # (...)
60
+ return log_info
61
+
62
+ def render(self):
63
+ """Render the current state of the environment."""
64
+ pass
65
+
66
+ def close(self):
67
+ """Perform any necessary cleanup."""
68
+ pass
69
+
70
+
71
+ class AgentState():
72
+
73
+ def __init__(self):
74
+ """Initialize the agent state."""
75
+ pass
76
+
77
+ def step(self, observation_from_env, policy_output=None):
78
+ """Update the agent state based on the observation and action.
79
+ The action is the output of the LLM.
80
+ """
81
+
82
+ Args:
83
+ observation_from_env (dict): The observation of the environment.
84
+ policy_output : The output of the policy.
85
+
86
+ Returns:
87
+ policy_id (str): The policy identifier.
88
+ policy_input (dict): The input to the policy.
89
+ action : The official action to be sent to the environment.
90
+ done (bool): Whether the LLM action is ready to be sent to the environment.
91
+ info (dict): Additional information about the agent.
92
+ """
93
+ # (...)
94
+ return policy_id, policy_input, action, done, info
95
+
96
+ def get_log_info(self):
97
+ """Get information about the agent required to log a trajectory.
98
+ Returns:
99
+ log_info (dict): Information about the agent required to log a trajectory.
100
+ """
101
+ # (...)
102
+ return log_info
103
+
104
+ def render(self):
105
+ """Render the current state of the environment."""
106
+ pass
107
+
108
+ def close(self):
109
+ """Perform any necessary cleanup."""
110
+ pass
111
+
112
+
113
+ Implicitely, the keys of the `observations` in the `step` method of the `MarlEnvironment` interface represent the set of agents from which an action is expected at the current step. The next step should only expect actions from the agents in the `observations` dictionary.
114
+
115
+ As you can see, both classes have a `get_log_info` method. This method is used to log the game. It returns a dictionary with keys being the agent identifiers and values being the information to log. The reason we need this is because the environment and the agent handler may need to log different information. It makes it easier to log from the perspective of each agent. The core environment class should not need to know about the details of the agent handler.
116
+
117
+
118
+
119
+ Running Environments in Parallel
120
+ --------------------------------
121
+ This standard allows the use of the `run_batched_matches` function (TODO: link) to run environments in an efficient way. The core idea is to batch the policy calls for all agents in the environment.
122
+
123
+ .. note::
124
+ The ``run_batched_matches`` function allows you to run multiple negotiation games, or "matches," in parallel.
125
+ After each environment is initialized, the function continuously loops over all active matches and checks which agents
126
+ are still pending actions. Each agent's logic can require multiple calls to the policy (e.g., an LLM) before an action
127
+ becomes "ready" to be sent to the environment. (For instance, an agent might need multiple policy calls before having a string which can be parsed into a valid action.) While an agent is waiting for a policy output, these calls for all agents across all matches are grouped together by unique policy identifier and processed in batch for efficiency. This is the core functionality of the ``run_batched_matches`` function.
128
+
129
+ Only once all actions from the required agents at a given step for an environment are ready does the function make a single ``env.step(...)`` call; this ensures
130
+ every match moves forward in lockstep for all its active agents. As soon as an environment signals it is done, the function
131
+ retrieves logged information from both the environment and the agent states before removing this match from the active set.
132
+
133
+ If there are more matches waiting to be processed, they are then started one by one to maintain the specified degree of parallelism.
134
+ This batching approach provides an efficient mechanism to handle multi-agent or multi-policy environments, ensuring minimal
135
+ overhead and a clear, unified flow for stepping through matches.
136
+
137
+ Here is a diagram that shows how the `run_batched_matches` function works at a high level:
138
+
139
+ .. image:: media/runbatch.png
140
+ :alt: Alternate text for the image
141
+ :width: 1000px
src_code_for_reproducibility/docs/source/modules.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src
2
+ ===
3
+
4
+ .. toctree::
5
+ :maxdepth: 4
6
+
7
+ src
src_code_for_reproducibility/docs/source/src.environments.dond.dond_agent.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.dond.dond\_agent module
2
+ ========================================
3
+
4
+ .. automodule:: src.environments.dond.dond_agent
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.dond.dond_game.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.dond.dond\_game module
2
+ =======================================
3
+
4
+ .. automodule:: src.environments.dond.dond_game
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.dond.dond_player.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.dond.dond\_agent module
2
+ =========================================
3
+
4
+ .. automodule:: src.environments.dond.dond_agent
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.dond.dond_statistics_funcs.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.dond.dond\_statistics\_funcs module
2
+ ====================================================
3
+
4
+ .. automodule:: src.environments.dond.dond_statistics_funcs
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.dond.dond_training_data_funcs.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.dond.dond\_training\_data\_funcs module
2
+ ========================================================
3
+
4
+ .. automodule:: src.environments.dond.dond_training_data_funcs
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.env_imports.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.env\_imports module
2
+ ====================================
3
+
4
+ .. automodule:: src.environments.env_imports
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_game.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.ipd.ipd\_game module
2
+ =====================================
3
+
4
+ .. automodule:: src.environments.ipd.ipd_game
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_log_funcs.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.ipd.ipd\_log\_funcs module
2
+ ===========================================
3
+
4
+ .. automodule:: src.environments.ipd.ipd_log_funcs
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_statistics_funcs.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.ipd.ipd\_statistics\_funcs module
2
+ ==================================================
3
+
4
+ .. automodule:: src.environments.ipd.ipd_statistics_funcs
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.ipd.ipd_training_data_funcs.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.environments.ipd.ipd\_training\_data\_funcs module
2
+ ======================================================
3
+
4
+ .. automodule:: src.environments.ipd.ipd_training_data_funcs
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.environments.rst ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ src.environments package
2
+ ========================
3
+
4
+ .. automodule:: src.environments
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
8
+
9
+ Subpackages
10
+ -----------
11
+
12
+ .. toctree::
13
+ :maxdepth: 4
14
+
15
+ src.environments.dond
16
+ src.environments.ipd
17
+
18
+ Submodules
19
+ ----------
20
+
21
+ .. toctree::
22
+ :maxdepth: 4
23
+
24
+ src.environments.env_imports
25
+ src.environments.environment_imports
src_code_for_reproducibility/docs/source/src.experiments.arithmetic_test.rst ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ src.experiments.arithmetic\_test module
2
+ =======================================
3
+
4
+ .. automodule:: src.experiments.arithmetic_test
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
src_code_for_reproducibility/docs/source/src.rst ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ src package
2
+ ===========
3
+
4
+ .. automodule:: src
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
8
+
9
+ Subpackages
10
+ -----------
11
+
12
+ .. toctree::
13
+ :maxdepth: 4
14
+
15
+ src.environments
16
+ src.experiments
17
+ src.generation
18
+ src.models
19
+ src.training
20
+ src.utils
21
+
22
+ Submodules
23
+ ----------
24
+
25
+ .. toctree::
26
+ :maxdepth: 4
27
+
28
+ src.run
src_code_for_reproducibility/docs/source/src.utils.rst ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ src.utils package
2
+ =================
3
+
4
+ .. automodule:: src.utils
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
8
+
9
+ Submodules
10
+ ----------
11
+
12
+ .. toctree::
13
+ :maxdepth: 4
14
+
15
+ src.utils.common_imports
16
+ src.utils.export_ppo_training_set
17
+ src.utils.extra_stats
18
+ src.utils.inherit_args
19
+ src.utils.log_gpu_usage
20
+ src.utils.log_statistics
21
+ src.utils.model_to_cpu
22
+ src.utils.parallel_shuffle
23
+ src.utils.quick_stats
24
+ src.utils.update_start_epoch
src_code_for_reproducibility/markov_games/__pycache__/agent.cpython-312.pyc ADDED
Binary file (3.2 kB). View file
 
src_code_for_reproducibility/markov_games/__pycache__/run_markov_games.cpython-312.pyc ADDED
Binary file (1.14 kB). View file
 
src_code_for_reproducibility/markov_games/__pycache__/simulation.cpython-312.pyc ADDED
Binary file (3.9 kB). View file
 
src_code_for_reproducibility/markov_games/diplomacy/diplomacy_env.py ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Tuple, Optional, Any
2
+ from diplomacy import Game
3
+ import random
4
+
5
+ class DiplomacyEnv:
6
+ """Multi-Agent Reinforcement Learning environment for Diplomacy.
7
+
8
+ This class wraps the Diplomacy game engine to provide an interface
9
+ compliant with the MARL standard.
10
+ """
11
+
12
+ def __init__(self, random_seed=None, map_name="standard", game_id=None, rules=None, max_steps=50):
13
+ """Initialize the Diplomacy environment.
14
+
15
+ Args:
16
+ map_name: The name of the map to use (default: "standard")
17
+ game_id: Optional game ID
18
+ rules: Optional rules to apply to the game
19
+ max_steps: Maximum number of steps before forcing game end (default: 10)
20
+ """
21
+ self.random_seed = random_seed
22
+ self.map_name = map_name
23
+ self.game_id = game_id
24
+ self.rules = rules or []
25
+ self.game = None
26
+ self.active_powers = []
27
+ self.render_mode = None
28
+ self.max_steps = max_steps
29
+ self.current_steps = 0
30
+
31
+ def reset(self):
32
+ """Reset the environment to an initial state and return the initial observation.
33
+
34
+ Returns:
35
+ observation: A dictionary where keys are agent identifiers and values are observations.
36
+ """
37
+ # Initialize a new game
38
+ self.game = Game(game_id=self.game_id, map_name=self.map_name)
39
+
40
+ # Apply rules
41
+ for rule in self.rules:
42
+ self.game.add_rule(rule)
43
+
44
+ # Determine active powers (not eliminated)
45
+ self.active_powers = [name for name, power in self.game.powers.items()
46
+ if not power.is_eliminated()]
47
+
48
+ # Reset step counter
49
+ self.current_steps = 0
50
+
51
+ # Create initial observations for all powers
52
+ observations = {}
53
+ for power_name in self.active_powers:
54
+ observations[power_name] = self._create_observation(power_name)
55
+
56
+ return observations
57
+
58
+ def step(self, actions):
59
+ """Take a step in the environment using the provided actions.
60
+
61
+ Args:
62
+ actions: A dictionary where keys are agent identifiers and values are actions.
63
+
64
+ Returns:
65
+ observations: A dictionary where keys are agent identifiers and values are observations.
66
+ done: Whether the episode has ended.
67
+ info: Additional information about the environment.
68
+ """
69
+ print(f"stepping {self.current_steps}")
70
+ self.current_steps += 1
71
+ # Apply actions (orders) for each power
72
+ for power_name, action in actions.items():
73
+ if power_name in self.active_powers:
74
+ orders = action.get("orders", [])
75
+ wait = action.get("wait", True)
76
+
77
+ # Set orders for the power
78
+ if orders:
79
+ self.game.set_orders(power_name, orders)
80
+
81
+ # Set wait flag
82
+ self.game.set_wait(power_name, wait)
83
+
84
+ # Check if all active powers are ready to proceed
85
+ if self.game.does_not_wait():
86
+ # Process the current phase
87
+ self.game.process()
88
+
89
+
90
+ # Update active powers list after processing
91
+ self.active_powers = [name for name, power in self.game.powers.items()
92
+ if not power.is_eliminated()]
93
+
94
+ # Create observations for all active powers
95
+ observations = {}
96
+ for power_name in self.active_powers:
97
+ observations[power_name] = self._create_observation(power_name)
98
+
99
+ # Check if the game is done (either naturally or due to max steps)
100
+ done = self.game.is_game_done or self.current_steps >= self.max_steps
101
+
102
+ # Create info dict
103
+ info = {
104
+ "phase": self.game.get_current_phase(),
105
+ "active_powers": self.active_powers,
106
+ "centers": self.game.get_centers(),
107
+ "units": self.game.get_units(),
108
+ "current_steps": self.current_steps,
109
+ "max_steps_reached": self.current_steps >= self.max_steps
110
+ }
111
+
112
+ return observations, done, info
113
+
114
+ def _create_observation(self, power_name):
115
+ """Create observation for a specific power.
116
+
117
+ Args:
118
+ power_name: The name of the power
119
+
120
+ Returns:
121
+ An observation dictionary
122
+ """
123
+ observation = {
124
+ "phase": self.game.get_current_phase(),
125
+ "units": self.game.get_units(),
126
+ "centers": self.game.get_centers(),
127
+ "orderable_locations": self.game.get_orderable_locations(power_name),
128
+ "order_status": self.game.get_order_status(power_name),
129
+ "possible_orders": self._get_possible_orders_for_power(power_name)
130
+ }
131
+ return observation
132
+
133
+ def _get_possible_orders_for_power(self, power_name):
134
+ """Get all possible orders for a power's units.
135
+
136
+ Args:
137
+ power_name: The name of the power
138
+
139
+ Returns:
140
+ A dictionary mapping units to their possible orders
141
+ """
142
+ all_possible_orders = self.game.get_all_possible_orders()
143
+
144
+ # Filter for only the locations where this power has units
145
+ power_units = self.game.get_units(power_name)
146
+ power_unit_locations = [unit[2:] for unit in power_units]
147
+
148
+ # For retreat phases, include retreating units
149
+ if self.game.phase_type == 'R':
150
+ power = self.game.get_power(power_name)
151
+ power_unit_locations.extend([unit[2:] for unit in power.retreats])
152
+
153
+ # For adjustment phases, include buildable locations
154
+ elif self.game.phase_type == 'A':
155
+ power = self.game.get_power(power_name)
156
+ # If we have more centers than units, we can build
157
+ if len(power.centers) > len(power.units):
158
+ buildable_sites = self.game._build_sites(power)
159
+ power_unit_locations.extend(buildable_sites)
160
+ # If we have more units than centers, we need to remove
161
+ elif len(power.units) > len(power.centers):
162
+ # All units are candidates for removal
163
+ pass
164
+
165
+ # Filter the possible orders to only those for this power's units/locations
166
+ power_possible_orders = {}
167
+ for loc, orders in all_possible_orders.items():
168
+ if loc[:3] in power_unit_locations:
169
+ power_possible_orders[loc] = orders
170
+
171
+ return power_possible_orders
172
+
173
+ def get_log_info(self):
174
+ """Get additional information about the environment for logging.
175
+
176
+ Returns:
177
+ log_info: Information about the environment required to log the game.
178
+ """
179
+ if not self.game:
180
+ return {}
181
+
182
+ return {
183
+ "game_id": self.game.game_id,
184
+ "phase": self.game.get_current_phase(),
185
+ "map_name": self.game.map_name,
186
+ "centers": self.game.get_centers(),
187
+ "units": self.game.get_units(),
188
+ "powers": {name: {
189
+ "units": power.units,
190
+ "centers": power.centers,
191
+ "is_eliminated": power.is_eliminated(),
192
+ "order_status": self.game.get_order_status(name)
193
+ } for name, power in self.game.powers.items()},
194
+ "orders": self.game.get_orders(),
195
+ "active_powers": self.active_powers,
196
+ "is_game_done": self.game.is_game_done,
197
+ "outcome": self.game.outcome if self.game.is_game_done else None
198
+ }
199
+
200
+ def render(self, mode='human'):
201
+ """Render the current state of the environment.
202
+
203
+ Args:
204
+ mode: The rendering mode ('human', 'svg', etc.)
205
+
206
+ Returns:
207
+ The rendered image if applicable
208
+ """
209
+ self.render_mode = mode
210
+ if self.game:
211
+ if mode == 'human':
212
+ # Just print basic game state
213
+ print(f"Game: {self.game.game_id}")
214
+ print(f"Phase: {self.game.get_current_phase()}")
215
+ print(f"Active Powers: {self.active_powers}")
216
+ print("Supply Centers:")
217
+ for power_name, centers in self.game.get_centers().items():
218
+ print(f" {power_name}: {centers}")
219
+ print("Units:")
220
+ for power_name, units in self.game.get_units().items():
221
+ print(f" {power_name}: {units}")
222
+ return None
223
+ elif mode == 'svg':
224
+ # Return SVG representation
225
+ return self.game.render(output_format='svg')
226
+ return None
227
+
228
+ def close(self):
229
+ """Perform any necessary cleanup."""
230
+ self.game = None
src_code_for_reproducibility/markov_games/diplomacy/diplomacy_logging.py ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ from utils.common_imports import *
4
+
5
+
6
+
7
+ def diplomacy_log_match(
8
+ path,
9
+ agents_log_info,
10
+ env_log_info,
11
+ metrics_func=None,
12
+ metrics_func_args=None
13
+ ):
14
+ """
15
+ Logs the Diplomacy game data and generates HTML visualizations using the get_log_info methods.
16
+
17
+ Args:
18
+ path (str): Base path to save the data.
19
+ agents_log_info (list): List of agent information dictionaries containing the get_log_info results.
20
+ env_log_info (dict): Environment information from its get_log_info method.
21
+ metrics_func (str, optional): Name of the function to calculate metrics.
22
+ metrics_func_args (dict, optional): Arguments for the metrics function.
23
+ """
24
+ # Create directory structure
25
+ os.makedirs(path, exist_ok=True)
26
+
27
+ # Save the environment log info
28
+ env_log_path = os.path.join(path, "env_log.json")
29
+ with open(env_log_path, "w") as f:
30
+ json.dump(env_log_info, f, indent=4, default=_json_serialize)
31
+
32
+ # Process each agent's log info
33
+ for agent_log in agents_log_info:
34
+ power_name = agent_log["power_name"]
35
+
36
+ # Define paths for raw data and statistics subfolders
37
+ power_path = os.path.join(path, power_name)
38
+ raw_data_path = os.path.join(power_path, "raw_data")
39
+ statistics_path = os.path.join(power_path, "statistics")
40
+
41
+ # Ensure directories exist
42
+ os.makedirs(raw_data_path, exist_ok=True)
43
+ os.makedirs(statistics_path, exist_ok=True)
44
+
45
+ # Determine the next available file number for raw data
46
+ raw_files = os.listdir(raw_data_path)
47
+ raw_numbers = [int(f.split('_')[-1].split('.')[0]) for f in raw_files if f.startswith("log_")]
48
+ next_raw_number = max(raw_numbers, default=0) + 1
49
+ raw_file = os.path.join(raw_data_path, f"log_{next_raw_number}.json")
50
+
51
+ # Save agent log info
52
+ with open(raw_file, "w") as f:
53
+ json.dump(agent_log, f, indent=4, default=_json_serialize)
54
+
55
+ # Log metrics if a metrics function is provided
56
+ if metrics_func:
57
+ metrics_files = os.listdir(statistics_path)
58
+ metrics_numbers = [int(f.split('_')[-1].split('.')[0]) for f in metrics_files if f.startswith("metrics_")]
59
+ next_metrics_number = max(metrics_numbers, default=0) + 1
60
+ metrics_file = os.path.join(statistics_path, f"metrics_{next_metrics_number}.json")
61
+
62
+ metrics = globals()[metrics_func](agent_log, info, **metrics_func_args)
63
+ with open(metrics_file, "w") as f:
64
+ json.dump(metrics, f, indent=4)
65
+
66
+ # Generate the HTML visualization
67
+ html_content = generate_diplomacy_html(agents_log_info, env_log_info)
68
+
69
+ # Ensure the html directory exists
70
+ html_path = os.path.join(path, "html")
71
+ os.makedirs(html_path, exist_ok=True)
72
+
73
+ # Determine the next available file number for HTML
74
+ html_files = os.listdir(html_path)
75
+ html_numbers = [int(f.split('_')[-1].split('.')[0]) for f in html_files if f.startswith("game_summary_")]
76
+ next_html_number = max(html_numbers, default=0) + 1
77
+ html_file = os.path.join(html_path, f"game_summary_{next_html_number}.html")
78
+
79
+ # Save the HTML content to a file
80
+ with open(html_file, "w") as f:
81
+ f.write(html_content)
82
+
83
+ def generate_diplomacy_html(agent_infos, env_info):
84
+ """
85
+ Generate HTML visualization for a Diplomacy game.
86
+
87
+ Args:
88
+ agent_infos (list): List of agent information dictionaries from get_log_info.
89
+ env_info (dict): Environment information from get_log_info.
90
+
91
+ Returns:
92
+ str: HTML content for the game visualization.
93
+ """
94
+ # Extract game information
95
+ game_id = env_info.get("game_id", "Unknown")
96
+ phase = env_info.get("phase", "Unknown")
97
+ map_name = env_info.get("map_name", "standard")
98
+ is_game_done = env_info.get("is_game_done", False)
99
+ outcome = env_info.get("outcome", [])
100
+
101
+ centers = env_info.get("centers", {})
102
+ units = env_info.get("units", {})
103
+
104
+ # HTML head and style
105
+ html_content = """
106
+ <!DOCTYPE html>
107
+ <html lang="en">
108
+ <head>
109
+ <meta charset="UTF-8">
110
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
111
+ <title>Diplomacy Game {game_id}</title>
112
+ <style>
113
+ body {{
114
+ font-family: 'Arial', sans-serif;
115
+ background-color: #f5f5f5;
116
+ color: #333333;
117
+ margin: 0;
118
+ padding: 20px;
119
+ }}
120
+ .container {{
121
+ display: grid;
122
+ grid-template-columns: repeat(3, 1fr);
123
+ grid-gap: 20px;
124
+ margin-bottom: 30px;
125
+ }}
126
+ .central-info {{
127
+ grid-column: span 3;
128
+ background: #fff;
129
+ padding: 20px;
130
+ border-radius: 10px;
131
+ box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
132
+ margin-bottom: 20px;
133
+ }}
134
+ .power-column {{
135
+ background: #fff;
136
+ padding: 15px;
137
+ border-radius: 10px;
138
+ box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1);
139
+ }}
140
+ .message {{
141
+ margin-bottom: 15px;
142
+ padding: 12px;
143
+ border-radius: 8px;
144
+ box-shadow: 0 1px 4px rgba(0, 0, 0, 0.1);
145
+ }}
146
+ .user {{
147
+ background: rgba(235, 245, 255, 0.8);
148
+ border-left: 4px solid #007bff;
149
+ }}
150
+ .assistant {{
151
+ background: rgba(240, 255, 240, 0.8);
152
+ border-right: 4px solid #28a745;
153
+ }}
154
+ .orders {{
155
+ background: rgba(255, 248, 225, 0.8);
156
+ border-left: 4px solid #ffc107;
157
+ }}
158
+ .role {{
159
+ font-weight: bold;
160
+ margin-bottom: 5px;
161
+ color: #333333;
162
+ }}
163
+ .power-name {{
164
+ text-align: center;
165
+ font-size: 1.4em;
166
+ margin-bottom: 15px;
167
+ color: #000;
168
+ font-weight: 600;
169
+ text-transform: uppercase;
170
+ letter-spacing: 1px;
171
+ }}
172
+ .game-info {{
173
+ display: grid;
174
+ grid-template-columns: repeat(2, 1fr);
175
+ grid-gap: 15px;
176
+ }}
177
+ .info-card {{
178
+ background: #f9f9f9;
179
+ padding: 15px;
180
+ border-radius: 8px;
181
+ box-shadow: 0 1px 3px rgba(0, 0, 0, 0.1);
182
+ }}
183
+ .supply-centers, .units-list {{
184
+ display: flex;
185
+ flex-wrap: wrap;
186
+ justify-content: space-between;
187
+ }}
188
+ .supply-center, .unit {{
189
+ flex: 0 0 30%;
190
+ margin-bottom: 10px;
191
+ padding: 8px;
192
+ background: #f0f0f0;
193
+ border-radius: 5px;
194
+ text-align: center;
195
+ }}
196
+ h2 {{
197
+ border-bottom: 2px solid #eee;
198
+ padding-bottom: 10px;
199
+ margin-top: 0;
200
+ }}
201
+ .outcome {{
202
+ background: #e8f5e9;
203
+ padding: 15px;
204
+ border-radius: 8px;
205
+ margin-top: 15px;
206
+ font-weight: bold;
207
+ text-align: center;
208
+ }}
209
+ .austria {{ border-top: 5px solid #ff5050; }}
210
+ .england {{ border-top: 5px solid #5050ff; }}
211
+ .france {{ border-top: 5px solid #50c0ff; }}
212
+ .germany {{ border-top: 5px solid #808080; }}
213
+ .italy {{ border-top: 5px solid #50ff50; }}
214
+ .russia {{ border-top: 5px solid #ffffff; border: 1px solid #ccc; }}
215
+ .turkey {{ border-top: 5px solid #c0c000; }}
216
+ </style>
217
+ </head>
218
+ <body>
219
+ <div class="central-info">
220
+ <h2>Game Information</h2>
221
+ <div class="game-info">
222
+ <div class="info-card">
223
+ <h3>Game Details</h3>
224
+ <p><strong>Game ID:</strong> {game_id}</p>
225
+ <p><strong>Phase:</strong> {phase}</p>
226
+ <p><strong>Map:</strong> {map_name}</p>
227
+ <p><strong>Status:</strong> {status}</p>
228
+ </div>
229
+ <div class="info-card">
230
+ <h3>Supply Centers</h3>
231
+ <div class="supply-centers">
232
+ """.format(
233
+ game_id=game_id,
234
+ phase=phase,
235
+ map_name=map_name,
236
+ status="Completed" if is_game_done else "Active"
237
+ )
238
+
239
+ # Add supply center information
240
+ for power, power_centers in centers.items():
241
+ html_content += f"""
242
+ <div class="supply-center">
243
+ <strong>{power}:</strong> {len(power_centers)}
244
+ </div>
245
+ """
246
+
247
+ html_content += """
248
+ </div>
249
+ </div>
250
+ </div>
251
+ """
252
+
253
+ # Add outcome if game is done
254
+ if is_game_done and outcome:
255
+ winners = outcome[1:] if len(outcome) > 1 else ["Draw"]
256
+ html_content += f"""
257
+ <div class="outcome">
258
+ <h3>Game Outcome</h3>
259
+ <p>Winners: {', '.join(winners)}</p>
260
+ </div>
261
+ """
262
+
263
+ html_content += """
264
+ </div>
265
+ <div class="container">
266
+ """
267
+
268
+ # Add each power's information
269
+ for agent_log in agent_infos:
270
+ power_name = agent_log["power_name"]
271
+ power_class = power_name.lower()
272
+ orders = agent_log.get("orders", [])
273
+ message_history = agent_log.get("message_history", [])
274
+
275
+ html_content += f"""
276
+ <div class="power-column {power_class}">
277
+ <div class="power-name">{power_name}</div>
278
+
279
+ <div class="info-card">
280
+ <h3>Units</h3>
281
+ <ul>
282
+ """
283
+
284
+ # Add units information
285
+ power_units = units.get(power_name, [])
286
+ for unit in power_units:
287
+ html_content += f"<li>{unit}</li>"
288
+
289
+ html_content += """
290
+ </ul>
291
+ </div>
292
+
293
+ <div class="message orders">
294
+ <div class="role">Final Orders</div>
295
+ <ul>
296
+ """
297
+
298
+ # Add orders
299
+ for order in orders:
300
+ html_content += f"<li>{order}</li>"
301
+
302
+ html_content += """
303
+ </ul>
304
+ </div>
305
+ """
306
+
307
+ # Add message history
308
+ for message in message_history:
309
+ if isinstance(message, dict):
310
+ # Skip system messages or handle differently
311
+ if message.get("role") == "system":
312
+ continue
313
+
314
+ role = message.get("role", "unknown")
315
+ content = message.get("content", "")
316
+
317
+ role_class = "user" if role == "user" else "assistant"
318
+ role_display = "Environment" if role == "user" else f"LLM ({power_name})"
319
+
320
+ # Escape HTML characters in content
321
+ content = content.replace("<", "&lt;").replace(">", "&gt;").replace("\n", "<br>")
322
+
323
+ html_content += f"""
324
+ <div class="message {role_class}">
325
+ <div class="role">{role_display}</div>
326
+ <p>{content}</p>
327
+ </div>
328
+ """
329
+ elif isinstance(message, str):
330
+ # Simple string messages (may be used in some implementations)
331
+ html_content += f"""
332
+ <div class="message">
333
+ <p>{message}</p>
334
+ </div>
335
+ """
336
+
337
+ html_content += """
338
+ </div>
339
+ """
340
+
341
+ html_content += """
342
+ </div>
343
+ </body>
344
+ </html>
345
+ """
346
+
347
+ return html_content
348
+
349
+ def _json_serialize(obj):
350
+ """
351
+ A helper function to convert non-JSON-serializable objects
352
+ (like OrderResult) into strings or dicts.
353
+ """
354
+ # Check for the specific object types you know are problematic
355
+ if obj.__class__.__name__ == "OrderResult":
356
+ # Return a string representation or a dict
357
+ return str(obj)
358
+
359
+ # Fallback: attempt to convert anything else to string
360
+ return str(obj)
src_code_for_reproducibility/markov_games/diplomacy/diplomacy_logging_for_training.py ADDED
File without changes
src_code_for_reproducibility/markov_games/ipd/Ipd_hard_coded_agents.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass
2
+ from typing import Any, Tuple
3
+
4
+ from mllm.markov_games.ipd.ipd_agent import IPDAgent
5
+ from mllm.markov_games.rollout_tree import AgentActLog, ChatTurn
6
+
7
+
8
+ @dataclass
9
+ class AlwaysCooperateIPDAgent(IPDAgent):
10
+ async def act(self, observation) -> Tuple[Any, AgentActLog]:
11
+ """
12
+ Always plays the cooperate action, ignoring observation.
13
+ Returns the configured cooperate_string so the simulation parses it as "C".
14
+ """
15
+
16
+ action = self.cooperate_string
17
+
18
+ # Log a minimal, structured chat turn for consistency with other agents
19
+ turn_text = f"Playing cooperate: {action}"
20
+ self.state.chat_history.append(
21
+ ChatTurn(
22
+ agent_id=self.agent_id,
23
+ role="assistant",
24
+ content=turn_text,
25
+ is_state_end=True,
26
+ )
27
+ )
28
+
29
+ act_log = AgentActLog(
30
+ chat_turns=[self.state.chat_history[-1]],
31
+ info=None,
32
+ )
33
+
34
+ # Advance internal counters similar to IPDAgent semantics
35
+ self.state.chat_counter = len(self.state.chat_history)
36
+ self.state.round_nb = observation.round_nb
37
+
38
+ return action, act_log
39
+
40
+
41
+ @dataclass
42
+ class AlwaysDefectIPDAgent(IPDAgent):
43
+ async def act(self, observation) -> Tuple[Any, AgentActLog]:
44
+ """
45
+ Always plays the defect action, ignoring observation.
46
+ Returns the configured defect_string so the simulation parses it as "D".
47
+ """
48
+
49
+ action = self.defect_string
50
+
51
+ # Log a minimal, structured chat turn for consistency with other agents
52
+ turn_text = f"Playing defect: {action}"
53
+ self.state.chat_history.append(
54
+ ChatTurn(
55
+ agent_id=self.agent_id,
56
+ role="assistant",
57
+ content=turn_text,
58
+ is_state_end=True,
59
+ )
60
+ )
61
+
62
+ act_log = AgentActLog(
63
+ chat_turns=[self.state.chat_history[-1]],
64
+ info=None,
65
+ )
66
+
67
+ # Advance internal counters similar to IPDAgent semantics
68
+ self.state.chat_counter = len(self.state.chat_history)
69
+ self.state.round_nb = observation.round_nb
70
+
71
+ return action, act_log
72
+
src_code_for_reproducibility/markov_games/ipd/__init__.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from .Ipd_hard_coded_agents import AlwaysCooperateIPDAgent, AlwaysDefectIPDAgent
2
+
3
+ __all__ = [
4
+ "AlwaysCooperateIPDAgent",
5
+ "AlwaysDefectIPDAgent",
6
+ ]
7
+
src_code_for_reproducibility/markov_games/ipd/__pycache__/Ipd_hard_coded_agents.cpython-312.pyc ADDED
Binary file (2.86 kB). View file
 
src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_agent.cpython-312.pyc ADDED
Binary file (4.7 kB). View file
 
src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_simulation.cpython-312.pyc ADDED
Binary file (6.72 kB). View file
 
src_code_for_reproducibility/markov_games/ipd/__pycache__/ipd_statistics.cpython-312.pyc ADDED
Binary file (1.28 kB). View file
 
src_code_for_reproducibility/markov_games/ipd/ipd_agent.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ import json
3
+ import random
4
+ import re
5
+ from collections.abc import Callable
6
+ from copy import deepcopy
7
+ from dataclasses import dataclass, field
8
+ from typing import Any, Dict, List, Optional, Tuple, Union
9
+
10
+ from mllm.markov_games.agent import Agent
11
+ from mllm.markov_games.rollout_tree import AgentActLog, ChatTurn
12
+
13
+
14
+ @dataclass
15
+ class IPDAgentState:
16
+ """
17
+ TOWRITE
18
+ """
19
+
20
+ nb_retries: int
21
+ round_nb: int
22
+ chat_counter: int
23
+ chat_history: List[ChatTurn]
24
+
25
+
26
+ @dataclass
27
+ class IPDAgent(Agent):
28
+ seed: int
29
+ agent_id: str
30
+ agent_name: str
31
+ policy: Callable[[List[Dict]], str]
32
+ intro_prompt: str # Introduction prompt explaining the game rules
33
+ goal_prompt: str # Prompt explaining the agent's goal
34
+ strategy_prompt: str # Prompt suggesting a strategy to the agent
35
+ max_errors: int # Maximum number of errors allowed before default action
36
+ allow_reasoning: bool # Whether to allow reasoning in the response
37
+ max_reasoning_chars: int # Maximum number of characters for reasoning
38
+ cooperate_string: str # string parsed as playing cooperate by simulation
39
+ defect_string: str # string parsed as playing defect by simulation
40
+
41
+ def __post_init__(self):
42
+ self.state = IPDAgentState(
43
+ nb_retries=0, round_nb=0, chat_counter=0, chat_history=[]
44
+ )
45
+
46
+ async def act(self, observation) -> Tuple[Any, AgentActLog]:
47
+ """
48
+ TOWRITE
49
+ """
50
+
51
+ action = None
52
+ action_is_ready = False
53
+ round_nb = observation.round_nb
54
+
55
+ # If it's the first round, we need to send the intro prompt
56
+ if round_nb == 0 and self.state.chat_counter == 0:
57
+ self.state.chat_history.append(
58
+ ChatTurn(
59
+ agent_id=self.agent_id,
60
+ role="user",
61
+ content=self.intro_prompt,
62
+ is_state_end=True,
63
+ )
64
+ )
65
+
66
+ # If new round
67
+ if round_nb > self.state.round_nb:
68
+ coagent_action = observation.last_coagent_move
69
+ user_message = f"Last round, the other agent played {coagent_action}."
70
+ self.state.chat_history.append(
71
+ ChatTurn(
72
+ agent_id=self.agent_id,
73
+ role="user",
74
+ content=user_message,
75
+ is_state_end=True,
76
+ )
77
+ )
78
+
79
+ # If not new round, try to get valid action from policy
80
+ output_chat_turn: ChatTurn = await self.policy(
81
+ state=self.state.chat_history,
82
+ agent_id=self.agent_id,
83
+ regex=f"({self.cooperate_string}|{self.defect_string})",
84
+ )
85
+ self.state.chat_history.append(output_chat_turn)
86
+ action = output_chat_turn.content
87
+
88
+ agent_step_log = AgentActLog(
89
+ chat_turns=self.state.chat_history[self.state.chat_counter :], info=None
90
+ )
91
+ self.state.chat_counter = len(self.state.chat_history)
92
+ self.state.round_nb = round_nb
93
+
94
+ return action, agent_step_log
95
+
96
+ def get_safe_copy(self):
97
+ """
98
+ Return a safe copy of the agent.
99
+ """
100
+ agent_copy = copy.copy(self)
101
+ agent_copy.state = copy.deepcopy(self.state)
102
+ return agent_copy
103
+
104
+ def reset(self):
105
+ self.state = IPDAgentState()
106
+ raise NotImplementedError
107
+
108
+ def render(self):
109
+ pass
110
+
111
+ def close(self):
112
+ pass
113
+
114
+ def get_agent_info(self):
115
+ pass
src_code_for_reproducibility/markov_games/ipd/ipd_simulation.py ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import copy
2
+ import random
3
+ from dataclasses import dataclass
4
+ from typing import Any, Dict, List, Optional, Tuple
5
+
6
+ import numpy as np
7
+
8
+ from mllm.markov_games.markov_game import Simulation
9
+ from mllm.markov_games.rollout_tree import SimulationStepLog
10
+ from mllm.utils.get_coagent_id import get_coagent_id
11
+
12
+
13
+ @dataclass
14
+ class IPDState:
15
+ """
16
+ State of the Iterated Prisoner's Dilemma game.
17
+ """
18
+
19
+ round_nb: int = 0
20
+ done: bool = False
21
+ last_moves: Dict[str, str] | None = None
22
+
23
+
24
+ @dataclass
25
+ class IPDObs:
26
+ """
27
+ Observation in Iterated Prisoner's Dilemma game.
28
+ """
29
+
30
+ round_nb: int
31
+ last_coagent_move: str | None
32
+
33
+
34
+ class IPD(Simulation):
35
+ """
36
+ Iterated Prisoner's Dilemma simulation following the standard.
37
+
38
+ In each round of the game, two agents simultaneously choose to either cooperate (C) or defect (D).
39
+ The payoffs are as follows:
40
+ - If both cooperate: Both receive the "reward" (usually 3 points)
41
+ - If both defect: Both receive the "punishment" (usually 1 point)
42
+ - If one cooperates and one defects: The defector receives the "temptation" (usually 5 points)
43
+ and the cooperator receives the "sucker" payoff (usually 0 points)
44
+
45
+ The game is played for a specified number of rounds.
46
+ """
47
+
48
+ def __init__(
49
+ self,
50
+ agent_ids: List[str],
51
+ agent_names: List[str],
52
+ seed: int,
53
+ rounds_per_game: int,
54
+ reward: float, # Both cooperate
55
+ punishment: float, # Both defect
56
+ temptation: float, # Defector's reward when other cooperates
57
+ sucker: float, # Cooperator's reward when other defects
58
+ cooperate_actions: List[str],
59
+ defect_actions: List[str],
60
+ ):
61
+ self.agent_ids = agent_ids
62
+ self.agent_names = agent_names
63
+ self.seed = seed
64
+ self.rounds_per_game = rounds_per_game
65
+ self.reward = reward
66
+ self.punishment = punishment
67
+ self.temptation = temptation
68
+ self.sucker = sucker
69
+ self.cooperate_actions = cooperate_actions
70
+ self.defect_actions = defect_actions
71
+ self.state = IPDState()
72
+
73
+ def step(self, actions: Dict[str, str]) -> Tuple[bool, SimulationStepLog]:
74
+ """
75
+ Take a step in the environment using the provided actions.
76
+ Here, the observations are just the states of the game.
77
+
78
+ Args:
79
+ actions (dict): A dictionary where keys are agent identifiers and values are actions ('C' or 'D').
80
+
81
+ Returns:
82
+ observations (dict): A dictionary where keys are agent identifiers and values are observations.
83
+ done (bool): Whether the episode has ended.
84
+ info (dict): Additional information about the environment.
85
+ """
86
+
87
+ # Calculate rewards using payoff matrix
88
+ agent0_action = actions[self.agent_ids[0]]
89
+ agent1_action = actions[self.agent_ids[1]]
90
+
91
+ # Normalize actions to standard cooperate/defect/gibberish format
92
+ def normalize_action(action):
93
+ if action in self.cooperate_actions:
94
+ return "C"
95
+ elif action in self.defect_actions:
96
+ return "D"
97
+ else:
98
+ return "D"
99
+
100
+ norm_action0 = normalize_action(agent0_action)
101
+ norm_action1 = normalize_action(agent1_action)
102
+
103
+ payoffs = {
104
+ ("C", "C"): [self.reward, self.reward],
105
+ ("C", "D"): [self.sucker, self.temptation],
106
+ ("D", "C"): [self.temptation, self.sucker],
107
+ ("D", "D"): [self.punishment, self.punishment],
108
+ }
109
+
110
+ round_rewards = {
111
+ self.agent_ids[0]: payoffs[(norm_action0, norm_action1)][0],
112
+ self.agent_ids[1]: payoffs[(norm_action0, norm_action1)][1],
113
+ }
114
+
115
+ # Update game state
116
+ self.state.round_nb += 1
117
+ self.state.last_moves = copy.deepcopy(actions)
118
+ done = self.state.round_nb >= self.rounds_per_game
119
+ step_log = SimulationStepLog(
120
+ rewards=round_rewards,
121
+ info={
122
+ "actions": {
123
+ self.agent_ids[0]: norm_action0,
124
+ self.agent_ids[1]: norm_action1,
125
+ }
126
+ },
127
+ )
128
+
129
+ return done, step_log
130
+
131
+ def get_obs(self):
132
+ """Returns all agent observations in dict
133
+ Returns:
134
+ observations
135
+ """
136
+ observations = {}
137
+ for agent_id in self.agent_ids:
138
+ observations[agent_id] = self.get_obs_agent(agent_id)
139
+ return observations
140
+
141
+ def get_obs_agent(self, agent_id):
142
+ """Returns observation for agent_id"""
143
+ if self.state.last_moves != None:
144
+ other_id = get_coagent_id(self.agent_ids, agent_id)
145
+ last_coagent_move = self.state.last_moves[other_id]
146
+ else:
147
+ last_coagent_move = None
148
+ obs = IPDObs(round_nb=self.state.round_nb, last_coagent_move=last_coagent_move)
149
+ return obs
150
+
151
+ def reset(self):
152
+ """Returns initial observations and states"""
153
+ self.state = IPDState()
154
+ return self.get_obs()
155
+
156
+ def get_safe_copy(self):
157
+ """
158
+ Return a safe copy of the simulation.
159
+ """
160
+ simulation_copy = copy.copy(self)
161
+ simulation_copy.state = copy.deepcopy(self.state)
162
+ return simulation_copy
src_code_for_reproducibility/markov_games/ipd/ipd_statistics.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Dict, Callable, List, Tuple
4
+
5
+ from mllm.markov_games.rollout_tree import SimulationStepLog
6
+
7
+
8
+ def avg_reward(sl: SimulationStepLog) -> List[Tuple[str, float]]:
9
+ for aid in sl.rewards.keys():
10
+ if "buffer" in str(aid) and "live" not in str(aid):
11
+ return None
12
+ # One value per agent at each step
13
+ rewards_dict = {f"reward-{aid}": float(v) for aid, v in (sl.rewards or {}).items()}
14
+ return [(key, value) for key, value in rewards_dict.items() if value is not None]
15
+
16
+ stat_functs: list[Callable[[SimulationStepLog], List[Tuple[str, float]]]] = [
17
+ avg_reward,
18
+ ]
src_code_for_reproducibility/markov_games/negotiation/README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Negotiation Games: core mechanics and variants
2
+
3
+ This family of games feature two agents who, in each round, may briefly communicate and then simultaneously propose how to split a fixed resource (most commonly 10 coins). Rewards are the amount kept multiplied by an agent’s per-unit value. The starting speaker alternates deterministically across rounds.
4
+
5
+ Communication is optional and variant-dependent: some settings encourage rich messaging to share private information, while others remove messaging entirely to focus on allocation behavior.
6
+
7
+ Proportional splitting is used when the two proposals exceed the available total: allocations are scaled proportionally rather than discarded. This preserves a useful learning signal even when agents over-claim.
8
+
9
+ ### Variants (in increasing difficulty)
10
+
11
+ - No‑Press Split
12
+ - Single item type (coins)
13
+ - No communication; agents go straight to making split proposals, with the starting player alternating deterministically.
14
+ - Motivation: mirrors no‑communication setups (e.g., Advantage Alignment) while keeping the split decision nontrivial.
15
+ - Deterministic Mode: values are fixed and public: one agent values coins at 10, the other at 1 (alternates each round).
16
+ - Stochastic Mode: values are random and uncorrelated.
17
+
18
+ - Trust-and-Split RPS (TAS-RPS)
19
+ - Single item type (coins)
20
+ - Each round, a rock–paper–scissors hand draw creates a strong asymmetry: the winner’s per-coin value is 10, the loser’s is 1.
21
+ - Each agent initially sees only their own hand and must communicate to coordinate an optimal split.
22
+ - Motivation: enforce large value disparity so one’s own value reveals little about the other’s (avoiding ceiling effects) and incentivize meaningful communication.
23
+
24
+ - Trust-and-Split (TAS)
25
+ - Single item type (coins); each round, each agent’s per-coin value is independently sampled in a broad range (e.g., 1–20).
26
+ - Each agent observes only their own value; they may use short messages to share and negotiate.
27
+ - Motivation: a simple blend that tests whether agents learn to exchange private information and coordinate proportional, value-aware splits.
28
+
29
+ - Deal-or-No-Deal (DOND)
30
+ - Introduced in [Deal or No Deal? End-to-End Learning for Negotiation Dialogues](https://arxiv.org/pdf/1706.05125)
31
+ - Multiple item types (typically "books", "hats" and "balls") with limited stocks; each agent has its own per-type values.
32
+ - A deal pays out only if both proposals exactly agree and respect the stock; otherwise no deal (zero reward) that round.
33
+ - Motivation: a known benchmark closer to real-world bargaining, where both parties must explicitly agree.
34
+
35
+
36
+
37
+
38
+
39
+
40
+
src_code_for_reproducibility/markov_games/negotiation/__pycache__/dond_agent.cpython-312.pyc ADDED
Binary file (4.19 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_agent.cpython-312.pyc ADDED
Binary file (10.9 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_hard_coded_policies.cpython-312.pyc ADDED
Binary file (3.23 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/nego_simulation.cpython-312.pyc ADDED
Binary file (12.2 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/no_press_nego_agent.cpython-312.pyc ADDED
Binary file (5.5 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/no_press_nego_simulation.cpython-312.pyc ADDED
Binary file (9.06 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/tas_agent.cpython-312.pyc ADDED
Binary file (6.14 kB). View file
 
src_code_for_reproducibility/markov_games/negotiation/__pycache__/tas_rps_agent.cpython-312.pyc ADDED
Binary file (5.59 kB). View file