GOVINDFROM commited on
Commit
2136269
Β·
verified Β·
1 Parent(s): 00011f2

Upload model card

Browse files
Files changed (1) hide show
  1. README.md +179 -0
README.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - reinforcement-learning
4
+ - game-theory
5
+ - colonel-blotto
6
+ - neurips-2025
7
+ - graph-neural-networks
8
+ - meta-learning
9
+ license: mit
10
+ ---
11
+
12
+ # Colonel Blotto: Advanced RL + LLM System for NeurIPS 2025
13
+
14
+ ![Status](https://img.shields.io/badge/status-trained-success)
15
+ ![Framework](https://img.shields.io/badge/framework-PyTorch-orange)
16
+ ![License](https://img.shields.io/badge/license-MIT-blue)
17
+
18
+ This repository contains trained models for the **Colonel Blotto game**, targeting the **NeurIPS 2025 MindGames workshop**. The system combines cutting-edge reinforcement learning with large language model fine-tuning.
19
+
20
+ ## 🎯 Model Overview
21
+
22
+ This is an advanced system that achieves strong performance on Colonel Blotto through:
23
+
24
+ - **Graph Neural Networks** for game state representation
25
+ - **FiLM layers** for fast opponent adaptation
26
+ - **Meta-learning** for strategy portfolios
27
+ - **LLM fine-tuning** (SFT + DPO) for strategic reasoning
28
+ - **Distillation** from LLMs back to efficient RL policies
29
+
30
+ ### Game Configuration
31
+
32
+ - **Fields**: 3
33
+ - **Units per round**: 20
34
+ - **Rounds per game**: 5
35
+ - **Training episodes**: 1000
36
+
37
+ ## πŸ“Š Performance Results
38
+
39
+
40
+ ## πŸ—οΈ Architecture
41
+
42
+ ### Policy Network
43
+
44
+ The core policy network uses a sophisticated architecture:
45
+
46
+ 1. **Graph Encoder**: Multi-layer Graph Attention Networks (GAT)
47
+ - Heterogeneous nodes: field nodes, round nodes, summary node
48
+ - Multi-head attention with 6 heads
49
+ - 3 layers of message passing
50
+
51
+ 2. **Opponent Encoder**: MLP-based encoder for opponent modeling
52
+ - Processes opponent history
53
+ - Learns behavioral patterns
54
+
55
+ 3. **FiLM Layers**: Feature-wise Linear Modulation
56
+ - Fast adaptation to opponent behavior
57
+ - Conditioned on opponent encoding
58
+
59
+ 4. **Portfolio Head**: Multi-strategy selection
60
+ - 6 specialist strategy heads
61
+ - Soft attention-based mixing
62
+
63
+ ### Training Pipeline
64
+
65
+ The models were trained through a comprehensive 7-phase pipeline:
66
+
67
+ 1. **Phase A**: Environment setup and action space generation
68
+ 2. **Phase B**: PPO training against diverse scripted opponents
69
+ 3. **Phase C**: Preference dataset generation (LLM vs LLM rollouts)
70
+ 4. **Phase D**: Supervised Fine-Tuning (SFT) of base LLM
71
+ 5. **Phase E**: Direct Preference Optimization (DPO)
72
+ 6. **Phase F**: Knowledge distillation from LLM to policy
73
+ 7. **Phase G**: PPO refinement after distillation
74
+
75
+ ## πŸ“¦ Repository Contents
76
+
77
+ ### Policy Models
78
+
79
+ - `policy_models/policy_final.pt`: PyTorch checkpoint
80
+ - `policy_models/policy_after_distill.pt`: PyTorch checkpoint
81
+ - `policy_models/policy_after_ppo.pt`: PyTorch checkpoint
82
+
83
+ ### Fine-tuned LLM Models
84
+
85
+ - `sft_model/`: SFT model (HuggingFace Transformers compatible)
86
+ - `dpo_model/`: DPO model (HuggingFace Transformers compatible)
87
+
88
+
89
+ ### Configuration & Results
90
+
91
+ - `master_config.json`: Complete training configuration
92
+ - `battleground_eval.json`: Comprehensive evaluation results
93
+ - `eval_scripted_after_ppo.json`: Post-PPO evaluation
94
+
95
+ ## πŸš€ Usage
96
+
97
+ ### Loading Policy Model
98
+
99
+ ```python
100
+ import torch
101
+ from your_policy_module import PolicyNet
102
+
103
+ # Load configuration
104
+ with open("master_config.json", "r") as f:
105
+ config = json.load(f)
106
+
107
+ # Initialize policy
108
+ policy = PolicyNet(
109
+ Ff=config["F"],
110
+ n_actions=231, # For F=3, U=20
111
+ hidden=config["hidden"],
112
+ gnn_layers=config["gnn_layers"],
113
+ gnn_heads=config["gnn_heads"],
114
+ n_strat=config["n_strat"]
115
+ )
116
+
117
+ # Load trained weights
118
+ policy.load_state_dict(torch.load("policy_models/policy_final.pt"))
119
+ policy.eval()
120
+ ```
121
+
122
+ ### Loading Fine-tuned LLM
123
+
124
+ ```python
125
+ from transformers import AutoTokenizer, AutoModelForCausalLM
126
+
127
+ # Load SFT or DPO model
128
+ tokenizer = AutoTokenizer.from_pretrained("./sft_model")
129
+ model = AutoModelForCausalLM.from_pretrained("./sft_model")
130
+
131
+ # Use for inference
132
+ inputs = tokenizer(prompt, return_tensors="pt")
133
+ outputs = model.generate(**inputs, max_new_tokens=32)
134
+ ```
135
+
136
+ ## πŸŽ“ Research Context
137
+
138
+ This work targets the **NeurIPS 2025 MindGames Workshop** with a focus on:
139
+
140
+ - **Strategic game AI** beyond traditional game-theoretic approaches
141
+ - **Hybrid systems** combining neural RL and LLM reasoning
142
+ - **Fast adaptation** to diverse opponents through meta-learning
143
+ - **Efficient deployment** via distillation
144
+
145
+ ### Key Innovations
146
+
147
+ 1. **Heterogeneous Graph Representation**: Novel graph structure for Blotto game states
148
+ 2. **Ground-truth Counterfactual Learning**: Exploiting game determinism
149
+ 3. **Multi-scale Representation**: Field-level, round-level, and game-level embeddings
150
+ 4. **LLM-to-RL Distillation**: Transferring strategic reasoning to efficient policies
151
+
152
+ ## πŸ“ Citation
153
+
154
+ If you use this work, please cite:
155
+
156
+ ```bibtex
157
+ @misc{colonelblotto2025neurips,
158
+ title={{Advanced Reinforcement Learning System for Colonel Blotto Games}},
159
+ author={{NeurIPS 2025 MindGames Submission}},
160
+ year={2025},
161
+ publisher={HuggingFace Hub},
162
+ howpublished={{\url{{https://huggingface.co/{repo_id}}}}},
163
+ }
164
+ ```
165
+
166
+ ## πŸ“„ License
167
+
168
+ MIT License - See LICENSE file for details
169
+
170
+ ## πŸ™ Acknowledgments
171
+
172
+ - Built for **NeurIPS 2025 MindGames Workshop**
173
+ - Uses PyTorch, HuggingFace Transformers, and PEFT
174
+ - Training infrastructure: NVIDIA H200 GPU
175
+
176
+ ---
177
+
178
+ **Generated**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
179
+ **Uploaded from**: Notebook Environment