danielrosehill Claude commited on
Commit
f209cc2
ยท
1 Parent(s): 617208b

Major refocus: System architecture over vote results

Browse files

Complete redesign emphasizing the AI system framework:

App structure (no emojis):
1. System Architecture - multi-agent design, structured outputs, model config
2. System Prompt Design - shows generic templates, country explorer
3. Structured Output Schema - JSON constraints, validation rules, user prompt template
4. Task Execution - execution flow, CLI usage, output format
5. Case Study Gaza Ceasefire - consolidated all resolution content here

README updates:
- Focus on multi-agent simulation framework
- Emphasize structured outputs and JSON constraints
- Highlight task execution model
- Case study as example, not primary focus
- Technical implementation details front and center

Key improvements:
- All emojis removed from app
- Resolution content consolidated into single case study tab
- Primary focus on system design, not voting results
- Detailed execution flow and CLI documentation
- JSON schema and validation prominent
- Clear technical architecture exposition

๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (2) hide show
  1. README.md +82 -64
  2. app.py +244 -119
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
- title: AI Agent UN - Gaza Ceasefire Resolution
3
- emoji: ๐ŸŒ
4
  colorFrom: blue
5
- colorTo: green
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
@@ -10,90 +10,108 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # ๐Ÿ‡บ๐Ÿ‡ณ AI Agent UN: Gaza Ceasefire Resolution Simulation
14
 
15
- An experimental Model United Nations simulation where AI agents representing all 195 UN member states vote on a ceasefire resolution for Gaza.
16
 
17
- ## ๐ŸŽฏ The Concept
18
 
19
- This project explores how large language models can simulate international diplomatic interactions by creating AI agents that embody:
20
- - **Foreign policy positions** based on historical voting records
21
- - **Diplomatic style** reflecting each country's approach to multilateral diplomacy
22
- - **National interests** and regional alliances
23
- - **Cultural and ideological perspectives**
24
 
25
- ### How It Works
26
 
27
- 1. **Agent System Prompts**: Each of the 195 countries has a detailed system prompt that defines their:
28
- - Historical positions on Middle East conflicts
29
- - Key alliances and regional groupings
30
- - Economic and security interests
31
- - Past voting patterns on similar resolutions
32
 
33
- 2. **Structured Voting**: Each AI agent receives the ceasefire resolution text and responds with:
34
- - A vote: YES, NO, or ABSTAIN
35
- - A diplomatic statement explaining their position
 
 
36
 
37
- 3. **Analysis**: Votes are aggregated and analyzed by region, showing how different parts of the world approach the issue
 
 
 
 
 
 
38
 
39
- ## ๐Ÿ“Š The Resolution
 
 
 
 
40
 
41
- **Motion**: Support for Ceasefire Agreement in Gaza and Commitment to Lasting Peace
 
 
 
 
42
 
43
- The resolution calls for:
44
- - Immediate and comprehensive ceasefire
45
- - Unhindered humanitarian access
46
- - Release of hostages and prisoners
47
- - Lifting of restrictions on Gaza
48
- - Two-state solution based on pre-1967 borders
49
- - International monitoring and accountability
50
 
51
- ## ๐Ÿค– Technical Details
 
 
 
 
52
 
53
- - **Model**: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
54
- - **Countries**: 195 UN member states
55
- - **Simulation Date**: October 9, 2025
56
- - **Vote Distribution**:
57
- - โœ… YES: 190 countries (97.4%)
58
- - โŒ NO: 3 countries (1.5%)
59
- - โšช ABSTAIN: 2 countries (1.0%)
60
 
61
- ## ๐Ÿ” Explore the Results
 
 
 
 
 
62
 
63
- Use the tabs above to:
64
- - **Vote Summary**: See the overall voting distribution
65
- - **Regional Analysis**: Compare how different regions voted
66
- - **Country Details**: Read individual countries' votes and diplomatic statements
67
- - **All Votes**: Browse the complete voting record
68
 
69
- ## ๐ŸŽ“ Educational Value
 
70
 
71
- This simulation demonstrates:
72
- - How AI can model complex geopolitical decision-making
73
- - The diversity of international perspectives on contentious issues
74
- - The role of historical context in diplomatic positions
75
- - Multi-agent AI systems in action
76
 
77
- ## โš ๏ธ Important Disclaimer
78
 
79
- This is an AI simulation for research and educational purposes only. The positions expressed by the AI agents:
80
- - Do NOT represent actual government policies
81
- - Are NOT official diplomatic stances
82
- - Should NOT be considered authoritative or predictive
83
- - Are based on historical patterns, not current intentions
84
 
85
- The simulation is designed to explore how AI models understand and represent different national perspectives based on publicly available information about countries' historical positions.
86
 
87
- ## ๐Ÿ”— Links
88
 
89
- - [GitHub Repository](https://github.com/yourusername/AI-Agent-UN)
90
- - [Full Source Code](https://github.com/yourusername/AI-Agent-UN/blob/main/scripts/run_motion.py)
91
- - [Agent System Prompts](https://github.com/yourusername/AI-Agent-UN/tree/main/agents/representatives)
92
 
93
- ## ๐Ÿค Contributing
 
 
 
94
 
95
- This is an experimental research project. Contributions, suggestions, and discussions are welcome!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ---
98
 
99
- Built with โค๏ธ using [Gradio](https://gradio.app) and [Claude](https://anthropic.com/claude)
 
1
  ---
2
+ title: AI Agent UN - Multi-Agent Simulation Framework
3
+ emoji: ๐Ÿ›๏ธ
4
  colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
 
10
  license: mit
11
  ---
12
 
13
+ # AI Agent United Nations: Multi-Agent Simulation Framework
14
 
15
+ A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.
16
 
17
+ ## System Overview
18
 
19
+ This is an experimental framework demonstrating:
20
+ - **Multi-agent coordination** across 195 independent AI agents
21
+ - **Structured output constraints** with strict JSON schema validation
22
+ - **Generic prompt templates** producing country-specific behaviors
23
+ - **Task execution model** for running resolutions through all agents
24
 
25
+ ## Architecture
26
 
27
+ ### Core Components
 
 
 
 
28
 
29
+ **Agent System Prompts**
30
+ - 195 country-specific agents (one per UN member state)
31
+ - Generic template structure (identical for all countries)
32
+ - Only country name and P5 status differ between prompts
33
+ - AI infers policy positions from training data
34
 
35
+ **Structured Output Schema**
36
+ ```json
37
+ {
38
+ "vote": "yes" | "no" | "abstain",
39
+ "statement": "Brief explanation (2-4 sentences)"
40
+ }
41
+ ```
42
 
43
+ **Task Execution**
44
+ - Python CLI for running simulations
45
+ - Sequential processing of all 195 agents
46
+ - JSON validation and error handling
47
+ - Aggregated results with metadata
48
 
49
+ **Model Configuration**
50
+ - Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
51
+ - Temperature: 0.7
52
+ - Max tokens: 800 per response
53
+ - Provider: Anthropic API
54
 
55
+ ## What This Tests
 
 
 
 
 
 
56
 
57
+ - **LLM Geopolitical Knowledge**: How well models understand different countries' foreign policies
58
+ - **Structured Outputs**: Consistency in producing valid JSON under constraints
59
+ - **Multi-Agent Systems**: Coordinating hundreds of independent AI agents
60
+ - **Prompt Engineering**: Generic templates yielding specific behaviors
61
+ - **Error Handling**: Graceful degradation when agents produce invalid outputs
62
 
63
+ ## Technical Implementation
 
 
 
 
 
 
64
 
65
+ **Execution Flow:**
66
+ 1. Load motion text from `tasks/motions/`
67
+ 2. Load 195 country agents
68
+ 3. For each agent: system prompt + user prompt โ†’ JSON response
69
+ 4. Validate and aggregate responses
70
+ 5. Save results with metadata
71
 
72
+ **Command Line Interface:**
73
+ ```bash
74
+ # Run simulation
75
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution
 
76
 
77
+ # With specific model
78
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022
79
 
80
+ # Test with sample
81
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
82
+ ```
 
 
83
 
84
+ ## Case Study
85
 
86
+ The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.
 
 
 
 
87
 
88
+ **Results:** 190 Yes, 3 No, 2 Abstain
89
 
90
+ This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.
91
 
92
+ ## Research Applications
 
 
93
 
94
+ - Testing LLM knowledge of international relations
95
+ - Evaluating structured output consistency
96
+ - Studying emergent behavior in multi-agent systems
97
+ - Educational demonstrations of diplomatic complexity
98
 
99
+ ## Limitations
100
+
101
+ This is a simulation for research and education:
102
+ - AI positions based on training data, not actual policies
103
+ - Does NOT predict real government decisions
104
+ - Should NOT be considered authoritative
105
+ - Real diplomacy involves classified information and human judgment
106
+
107
+ ## Open Source
108
+
109
+ All code, prompts, and data available on GitHub:
110
+
111
+ - Repository: https://github.com/danielrosehill/AI-Agent-UN
112
+ - System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
113
+ - Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py
114
 
115
  ---
116
 
117
+ Built with Gradio | Powered by Anthropic Claude
app.py CHANGED
@@ -27,7 +27,6 @@ def load_motion():
27
  except:
28
  return "Motion text not found."
29
 
30
- # Visualization functions
31
  def create_vote_summary_chart(data):
32
  vote_summary = data['vote_summary']
33
  fig = go.Figure(data=[go.Pie(
@@ -51,11 +50,11 @@ def get_country_response(country_name, data):
51
 
52
  for vote in data['votes']:
53
  if vote['country'].lower() == country_name.lower():
54
- vote_emoji = "โœ…" if vote['vote'] == 'yes' else "โŒ" if vote['vote'] == 'no' else "โšช"
55
  response = f"""
56
- ## {vote_emoji} Vote: {vote['vote'].upper()}
 
 
57
 
58
- ### Diplomatic Statement:
59
  {vote['statement']}
60
  """
61
  return response, vote['country_slug']
@@ -66,80 +65,107 @@ data = load_data()
66
  country_names = sorted([v['country'] for v in data['votes']])
67
  motion_text = load_motion()
68
 
69
- # Create Gradio interface
70
- with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
 
 
 
71
 
72
- gr.Markdown("""
73
- # ๐Ÿค– AI Agent United Nations Experiment
74
 
75
- ## Simulating International Diplomacy with Large Language Models
76
 
77
- This is an experimental research project that explores how AI can model international diplomatic behavior.
78
- Each of the 195 UN member states is represented by an AI agent with a unique system prompt defining their
79
- foreign policy positions, national interests, and diplomatic style.
80
- """)
81
 
82
- with gr.Tab("๐Ÿ”ฌ The Experiment"):
83
- gr.Markdown("""
84
- ## How It Works
 
 
85
 
86
- ### 1. Agent Architecture
87
- Each country is represented by an AI agent powered by **Claude 3.5 Sonnet** (claude-3-5-sonnet-20241022).
88
- Every agent receives a unique system prompt that defines:
89
 
90
- - **National Identity**: The country they represent and their role
91
- - **Core Responsibilities**: How to advocate for their country's interests
92
- - **Behavioral Guidelines**: Diplomatic style and historical context
93
- - **Key Considerations**: Security, economic, and strategic factors
94
- - **Decision Framework**: How to analyze and respond to resolutions
95
 
96
- ### 2. The System Prompts
97
-
98
- The system prompts are **generic templates** - they do NOT contain country-specific foreign policy positions.
99
- Instead, they instruct the AI to:
100
- - Draw upon the country's historical positions (from the model's training data)
101
- - Consider national security and economic interests
102
- - Maintain appropriate diplomatic tone
103
- - Think strategically about alliances and precedents
104
-
105
- This means the AI agent must infer each country's likely position based on what it has learned
106
- during training about that country's foreign policy, voting patterns, and geopolitical context.
107
-
108
- ### 3. The Process
109
-
110
- 1. **Input**: Each agent receives the same UN resolution text
111
- 2. **Processing**: The agent analyzes how the resolution affects their country's interests
112
- 3. **Output**: The agent produces a structured JSON response containing:
113
- - A vote: YES, NO, or ABSTAIN
114
- - A diplomatic statement explaining their position
115
 
116
- ### 4. What This Tests
 
117
 
118
- This experiment explores:
119
- - How well LLMs understand different countries' foreign policy positions
120
- - Whether AI can model complex geopolitical decision-making
121
- - The diversity of perspectives in international relations
122
- - Multi-agent AI systems in realistic scenarios
123
 
124
- ### 5. Important Limitations
 
 
 
125
 
126
- โš ๏ธ **This is a simulation, not prediction:**
127
- - The AI agents' positions are based on historical patterns in training data
128
- - They do NOT represent actual government policies or intentions
129
- - They should NOT be considered authoritative or predictive
130
- - Real diplomacy involves classified information, domestic politics, and human judgment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  """)
132
 
133
- with gr.Tab("๐Ÿ“‹ System Prompt Explorer"):
134
  gr.Markdown("""
135
- ## Explore the Agent System Prompts
 
 
 
136
 
137
- Select any country to view the exact system prompt their AI agent received.
138
- Notice how the prompts are **identical in structure** - the only differences are:
139
- - The country name
140
- - Whether they're a P5 member (for veto power context)
141
 
142
- The AI must infer everything else from its training data about each country.
 
 
 
 
 
 
143
  """)
144
 
145
  with gr.Row():
@@ -150,11 +176,11 @@ with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
150
  value="United States"
151
  )
152
  gr.Markdown("""
153
- ### Try comparing:
154
- - **P5 members**: United States, China, Russia, United Kingdom, France
155
- - **Regional powers**: Brazil, India, South Africa, Nigeria
156
- - **Small states**: Palau, Tuvalu, Monaco
157
- - **Key stakeholders**: Israel, Palestine, Egypt, Iran
158
  """)
159
 
160
  with gr.Column(scale=2):
@@ -169,63 +195,155 @@ with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
169
  outputs=system_prompt_display
170
  )
171
 
172
- with gr.Tab("๐Ÿ“œ The Resolution"):
 
 
 
 
 
 
 
 
173
  gr.Markdown("""
174
- ## The Motion Presented to All Agents
 
 
 
 
 
 
 
 
 
 
 
 
 
175
 
176
- Every AI agent received this exact same resolution text and was asked to vote on it.
177
 
178
- **Resolution**: Support for Ceasefire Agreement in Gaza and Commitment to Lasting Peace
 
 
 
 
 
 
 
 
179
  """)
180
 
181
- gr.Markdown(motion_text)
182
 
183
- with gr.Tab("๐Ÿ—ณ๏ธ Case Study: Gaza Ceasefire"):
184
  gr.Markdown("""
185
- ## Simulation Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
- This tab shows the results when all 195 AI country agents voted on the ceasefire resolution.
188
- This is ONE example of the experiment in action.
189
  """)
190
 
 
 
 
 
 
191
  with gr.Row():
192
  with gr.Column():
193
  vote_chart = gr.Plot(value=create_vote_summary_chart(data))
194
 
 
195
  gr.Markdown(f"""
196
- ### Results Summary
197
- - **Yes votes:** {data['vote_summary']['yes']} ({data['vote_summary']['yes']/data['total_votes']*100:.1f}%)
198
- - **No votes:** {data['vote_summary']['no']} ({data['vote_summary']['no']/data['total_votes']*100:.1f}%)
199
- - **Abstentions:** {data['vote_summary']['abstain']} ({data['vote_summary']['abstain']/data['total_votes']*100:.1f}%)
200
-
201
- **Model**: {data['model']}
202
- **Date**: {data['timestamp'][:10]}
 
 
 
203
  """)
204
 
205
- with gr.Tab("๐Ÿ” Agent Response Inspector"):
206
- gr.Markdown("""
207
- ## Compare System Prompt โ†’ Agent Response
208
-
209
- Select a country to see:
210
- 1. The system prompt they received
211
- 2. The vote and statement they produced
212
-
213
- This shows how the generic prompt + the model's knowledge โ†’ specific diplomatic position
214
- """)
215
 
216
  country_inspector = gr.Dropdown(
217
  choices=country_names,
218
- label="Select Country to Inspect",
219
  value="United States"
220
  )
221
 
222
  with gr.Row():
223
  with gr.Column():
224
- gr.Markdown("### System Prompt Received")
225
  inspector_prompt = gr.Markdown(value=load_system_prompt("united-states"))
226
 
227
  with gr.Column():
228
- gr.Markdown("### Agent's Response")
229
  inspector_response = gr.Markdown(value=get_country_response("United States", data)[0])
230
 
231
  def update_inspector(country):
@@ -239,8 +357,7 @@ with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
239
  outputs=[inspector_prompt, inspector_response]
240
  )
241
 
242
- with gr.Tab("๐Ÿ“Š All Responses"):
243
- gr.Markdown("### Complete voting record with all diplomatic statements")
244
 
245
  votes_data = pd.DataFrame([
246
  {
@@ -262,38 +379,46 @@ with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
262
  ---
263
  ## About This Project
264
 
265
- **AI Agent UN** is an experimental research project exploring multi-agent AI systems in international relations contexts.
 
266
 
267
- ### Key Points
268
 
269
- โœ… **What this is:**
270
- - An AI experiment in modeling diplomatic behavior
271
- - A research tool for understanding LLM capabilities
272
- - An educational demonstration of international relations complexity
273
 
274
- โš ๏ธ **What this is NOT:**
275
- - A prediction of actual government positions
276
- - An authoritative source on foreign policy
277
- - A replacement for real diplomatic analysis
278
 
279
- ### Open Source
 
 
 
 
 
280
 
281
- This project is open source. All system prompts, code, and simulation results are available on GitHub.
282
 
283
- - ๐Ÿ“‚ [GitHub Repository](https://github.com/danielrosehill/AI-Agent-UN)
284
- - ๐Ÿ“– [Documentation](https://github.com/danielrosehill/AI-Agent-UN/blob/main/README.md)
285
- - ๐Ÿค– [Agent Prompts](https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives)
 
 
 
 
 
286
 
287
- ### Technical Details
288
 
289
- - **Model**: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
290
- - **Countries**: 195 UN member states
291
- - **Output Format**: Structured JSON (vote + statement)
292
- - **System Prompts**: Generic templates (no country-specific policies hardcoded)
293
 
294
  ---
295
 
296
- *Built with [Gradio](https://gradio.app) | Powered by [Anthropic Claude](https://anthropic.com/claude)*
297
  """)
298
 
299
  if __name__ == "__main__":
 
27
  except:
28
  return "Motion text not found."
29
 
 
30
  def create_vote_summary_chart(data):
31
  vote_summary = data['vote_summary']
32
  fig = go.Figure(data=[go.Pie(
 
50
 
51
  for vote in data['votes']:
52
  if vote['country'].lower() == country_name.lower():
 
53
  response = f"""
54
+ **Vote:** {vote['vote'].upper()}
55
+
56
+ **Diplomatic Statement:**
57
 
 
58
  {vote['statement']}
59
  """
60
  return response, vote['country_slug']
 
65
  country_names = sorted([v['country'] for v in data['votes']])
66
  motion_text = load_motion()
67
 
68
+ # JSON schema for structured output
69
+ json_schema = """{
70
+ "vote": "yes" | "no" | "abstain",
71
+ "statement": "Brief explanation (2-4 sentences)"
72
+ }"""
73
 
74
+ # User prompt template
75
+ user_prompt_template = """You are voting on the following UN General Assembly resolution:
76
 
77
+ {RESOLUTION_TEXT}
78
 
79
+ You must respond with a JSON object containing:
80
+ 1. "vote": Your vote - must be exactly one of: "yes", "no", or "abstain"
81
+ 2. "statement": A brief statement (2-4 sentences) explaining your country's position
 
82
 
83
+ IMPORTANT: Your statement must articulate {COUNTRY_NAME}'s UNIQUE perspective, national interests, and specific reasons for this vote. Reference your country's:
84
+ - Historical positions on this issue
85
+ - Regional concerns and alliances
86
+ - Domestic political considerations
87
+ - Specific clauses in the resolution that align with or contradict your interests
88
 
89
+ Avoid generic diplomatic language. Be specific to {COUNTRY_NAME}'s situation and worldview.
 
 
90
 
91
+ Your response must be valid JSON in this exact format:
92
+ {
93
+ "vote": "yes",
94
+ "statement": "Your explanation here."
95
+ }"""
96
 
97
+ # Create Gradio interface
98
+ with gr.Blocks(title="AI Agent UN Experiment", theme=gr.themes.Soft()) as demo:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
+ gr.Markdown("""
101
+ # AI Agent United Nations: Multi-Agent Simulation System
102
 
103
+ ## Modeling International Diplomacy with Structured AI Agents
 
 
 
 
104
 
105
+ An experimental framework for simulating UN voting behavior using large language models.
106
+ Each of 195 UN member states is represented by an AI agent with structured system prompts
107
+ that must produce constrained JSON outputs for resolutions.
108
+ """)
109
 
110
+ with gr.Tab("System Architecture"):
111
+ gr.Markdown("""
112
+ ## System Design
113
+
114
+ This is a multi-agent AI system designed to simulate diplomatic decision-making in international forums.
115
+
116
+ ### Core Components
117
+
118
+ **1. Agent System Prompts**
119
+ - Each country has a unique system prompt (195 total)
120
+ - Prompts are generic templates - identical structure for all countries
121
+ - Only country name and P5 status differ between prompts
122
+ - No country-specific policy positions are hardcoded
123
+ - AI must infer positions from training data about each country
124
+
125
+ **2. Structured Output Constraints**
126
+ - All agents must return valid JSON
127
+ - Strict schema enforcement
128
+ - Two required fields: `vote` and `statement`
129
+ - Vote must be one of: `yes`, `no`, `abstain`
130
+ - Statement must be 2-4 sentences
131
+
132
+ **3. Task Running Model**
133
+ - Python script iterates through all 195 country agents
134
+ - Each agent receives: system prompt + resolution text + output schema
135
+ - Agent processes and returns structured JSON response
136
+ - Results aggregated into single JSON file with metadata
137
+
138
+ **4. Model Configuration**
139
+ - Primary model: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
140
+ - Temperature: 0.7 (balance between consistency and variation)
141
+ - Max tokens: 800 per response
142
+ - Provider: Anthropic API (cloud)
143
+
144
+ ### What This Tests
145
+
146
+ - **LLM Knowledge**: How well models understand different countries' foreign policies
147
+ - **Structured Outputs**: Ability to consistently produce valid JSON under constraints
148
+ - **Multi-Agent Systems**: Coordinating 195 independent AI agents
149
+ - **Prompt Engineering**: Generic templates producing specific behaviors
150
+ - **Consistency**: Whether similar countries produce similar responses
151
  """)
152
 
153
+ with gr.Tab("System Prompt Design"):
154
  gr.Markdown("""
155
+ ## Agent System Prompt Template
156
+
157
+ All country agents use the same prompt structure. The AI must infer country-specific positions
158
+ from its training data about each nation's history, alliances, and interests.
159
 
160
+ **Template Components:**
 
 
 
161
 
162
+ 1. **Role and Identity** - Defines the country and UN membership status
163
+ 2. **Core Responsibilities** - Instructions to represent national interests
164
+ 3. **Behavioral Guidelines** - How to stay in character diplomatically
165
+ 4. **Key Considerations** - What factors to analyze (security, economics, alliances)
166
+ 5. **Instructions** - Process for evaluating and voting on resolutions
167
+
168
+ **View any country's system prompt below:**
169
  """)
170
 
171
  with gr.Row():
 
176
  value="United States"
177
  )
178
  gr.Markdown("""
179
+ **Compare examples:**
180
+ - P5 members: United States, China, Russia, United Kingdom, France
181
+ - Regional powers: Brazil, India, South Africa, Nigeria
182
+ - Small states: Palau, Tuvalu, Monaco
183
+ - Key stakeholders: Israel, Palestine, Egypt, Iran
184
  """)
185
 
186
  with gr.Column(scale=2):
 
195
  outputs=system_prompt_display
196
  )
197
 
198
+ with gr.Tab("Structured Output Schema"):
199
+ gr.Markdown("""
200
+ ## JSON Output Constraints
201
+
202
+ Every agent must produce a valid JSON response conforming to this schema:
203
+ """)
204
+
205
+ gr.Code(json_schema, language="json", label="Required Output Schema")
206
+
207
  gr.Markdown("""
208
+ ### Validation Rules
209
+
210
+ **Vote Field:**
211
+ - Type: String (enum)
212
+ - Allowed values: `"yes"`, `"no"`, `"abstain"`
213
+ - Case-insensitive on input, normalized to lowercase
214
+ - Required field - missing value causes error
215
+
216
+ **Statement Field:**
217
+ - Type: String
218
+ - Length: 2-4 sentences recommended
219
+ - Must be country-specific (not generic)
220
+ - Must reference national interests and historical positions
221
+ - Required field - missing value causes error
222
 
223
+ ### Error Handling
224
 
225
+ If an agent produces invalid output:
226
+ 1. JSON parsing attempted with markdown stripping
227
+ 2. If parsing fails: agent recorded as `abstain` with error flag
228
+ 3. If validation fails: agent recorded as `abstain` with error flag
229
+ 4. Error logged for debugging but simulation continues
230
+
231
+ ### User Prompt Template
232
+
233
+ Below is the exact prompt template sent to each agent (with variables filled in):
234
  """)
235
 
236
+ gr.Code(user_prompt_template, language="markdown", label="User Prompt Template")
237
 
238
+ with gr.Tab("Task Execution"):
239
  gr.Markdown("""
240
+ ## How Simulations Run
241
+
242
+ ### Execution Flow
243
+
244
+ ```
245
+ 1. Load motion text from tasks/motions/{motion_id}.md
246
+ 2. Load country list from data/bodies/full-member-states.json
247
+ 3. For each country (195 total):
248
+ a. Load country's system prompt
249
+ b. Construct user prompt with motion text
250
+ c. Send to AI model (system + user prompt)
251
+ d. Parse and validate JSON response
252
+ e. Store result with metadata
253
+ 4. Aggregate all responses into single JSON file
254
+ 5. Calculate vote summary statistics
255
+ 6. Save timestamped and "latest" versions
256
+ ```
257
+
258
+ ### Command Line Interface
259
+
260
+ **Basic usage:**
261
+ ```bash
262
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution
263
+ ```
264
+
265
+ **With options:**
266
+ ```bash
267
+ # Use specific model
268
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022
269
+
270
+ # Test with sample (5 countries only)
271
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
272
+
273
+ # Use local model (Ollama)
274
+ python scripts/run_motion.py 01_gaza_ceasefire_resolution --provider local --model llama3
275
+ ```
276
+
277
+ ### Output Format
278
+
279
+ Results saved to `tasks/reactions/` as JSON:
280
+ - `{motion_id}_{timestamp}.json` - Timestamped archive
281
+ - `{motion_id}_latest.json` - Latest simulation (overwritten)
282
+
283
+ **Metadata included:**
284
+ - `motion_id`: Identifier for the resolution
285
+ - `timestamp`: ISO 8601 timestamp
286
+ - `provider`: cloud or local
287
+ - `model`: Model identifier used
288
+ - `total_votes`: Number of countries
289
+ - `vote_summary`: Counts by vote type
290
+ - `votes`: Array of all country responses
291
+
292
+ ### Configuration
293
+
294
+ Environment variables (`.env` file):
295
+ ```
296
+ ANTHROPIC_API_KEY=your_key_here
297
+ MODEL_NAME=claude-3-5-sonnet-20241022
298
+ ```
299
+ """)
300
+
301
+ with gr.Tab("Case Study: Gaza Ceasefire Resolution"):
302
+ gr.Markdown("""
303
+ ## Example Simulation Run
304
 
305
+ This demonstrates the system with a real UN resolution about a Gaza ceasefire.
306
+ All 195 country agents voted on this resolution using the system described above.
307
  """)
308
 
309
+ gr.Markdown("### The Resolution")
310
+ gr.Markdown(motion_text)
311
+
312
+ gr.Markdown("### Aggregated Results")
313
+
314
  with gr.Row():
315
  with gr.Column():
316
  vote_chart = gr.Plot(value=create_vote_summary_chart(data))
317
 
318
+ with gr.Column():
319
  gr.Markdown(f"""
320
+ ### Vote Summary
321
+ - **Yes:** {data['vote_summary']['yes']} ({data['vote_summary']['yes']/data['total_votes']*100:.1f}%)
322
+ - **No:** {data['vote_summary']['no']} ({data['vote_summary']['no']/data['total_votes']*100:.1f}%)
323
+ - **Abstain:** {data['vote_summary']['abstain']} ({data['vote_summary']['abstain']/data['total_votes']*100:.1f}%)
324
+
325
+ ### Simulation Metadata
326
+ - **Model:** {data['model']}
327
+ - **Date:** {data['timestamp'][:10]}
328
+ - **Countries:** {data['total_votes']}
329
+ - **Provider:** {data['provider']}
330
  """)
331
 
332
+ gr.Markdown("### Individual Country Responses")
 
 
 
 
 
 
 
 
 
333
 
334
  country_inspector = gr.Dropdown(
335
  choices=country_names,
336
+ label="Select Country to View Response",
337
  value="United States"
338
  )
339
 
340
  with gr.Row():
341
  with gr.Column():
342
+ gr.Markdown("**System Prompt Received:**")
343
  inspector_prompt = gr.Markdown(value=load_system_prompt("united-states"))
344
 
345
  with gr.Column():
346
+ gr.Markdown("**JSON Output Produced:**")
347
  inspector_response = gr.Markdown(value=get_country_response("United States", data)[0])
348
 
349
  def update_inspector(country):
 
357
  outputs=[inspector_prompt, inspector_response]
358
  )
359
 
360
+ gr.Markdown("### Complete Response Data")
 
361
 
362
  votes_data = pd.DataFrame([
363
  {
 
379
  ---
380
  ## About This Project
381
 
382
+ **AI Agent UN** is an experimental framework for simulating international diplomatic decision-making
383
+ using multi-agent AI systems with structured outputs.
384
 
385
+ ### Research Applications
386
 
387
+ - Testing LLM knowledge of geopolitics and international relations
388
+ - Evaluating structured output consistency across hundreds of agents
389
+ - Studying emergent behavior in multi-agent systems
390
+ - Educational demonstrations of diplomatic diversity
391
 
392
+ ### Technical Implementation
 
 
 
393
 
394
+ - **Model:** Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
395
+ - **Agents:** 195 (one per UN member state)
396
+ - **System Prompts:** Generic templates (country-agnostic)
397
+ - **Output Format:** Structured JSON with validation
398
+ - **Execution:** Python CLI with parallel processing support
399
+ - **Storage:** JSON files with metadata
400
 
401
+ ### Limitations and Disclaimers
402
 
403
+ This is a simulation for research and educational purposes:
404
+ - AI positions are based on training data, not actual policies
405
+ - Does NOT predict real government decisions
406
+ - Should NOT be considered authoritative
407
+ - Real diplomacy involves classified intel and human judgment
408
+ - Training data may be outdated or incomplete
409
+
410
+ ### Open Source
411
 
412
+ All code, prompts, and data are open source:
413
 
414
+ - GitHub Repository: https://github.com/danielrosehill/AI-Agent-UN
415
+ - System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
416
+ - Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py
417
+ - Documentation: https://github.com/danielrosehill/AI-Agent-UN/blob/main/README.md
418
 
419
  ---
420
 
421
+ Built with Gradio | Powered by Anthropic Claude
422
  """)
423
 
424
  if __name__ == "__main__":