mnoorchenar commited on
Commit
edf56a5
Β·
1 Parent(s): 4458bd1

Update 2026-03-20 14:40:36

Browse files
Files changed (5) hide show
  1. .claude/settings.local.json +8 -0
  2. Dockerfile +2 -2
  3. README.md +250 -2
  4. app.py +1089 -10
  5. requirements.txt +5 -1
.claude/settings.local.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(python -c \"import ast; ast.parse\\(open\\(''app.py''\\).read\\(\\)\\); print\\(''Syntax OK''\\)\")",
5
+ "Bash(python -c \":*)"
6
+ ]
7
+ }
8
+ }
Dockerfile CHANGED
@@ -1,7 +1,7 @@
1
- ο»ΏFROM python:3.11-slim
2
  WORKDIR /app
3
  COPY requirements.txt .
4
  RUN pip install --no-cache-dir -r requirements.txt
5
- COPY . .
6
  EXPOSE 7860
7
  CMD ["python", "app.py"]
 
1
+ ο»ΏFROM python:3.10-slim
2
  WORKDIR /app
3
  COPY requirements.txt .
4
  RUN pip install --no-cache-dir -r requirements.txt
5
+ COPY app.py .
6
  EXPOSE 7860
7
  CMD ["python", "app.py"]
README.md CHANGED
@@ -1,8 +1,256 @@
1
- ο»Ώ---
2
- title: AdRL-Studio
3
  colorFrom: purple
4
  colorTo: blue
5
  sdk: docker
6
  app_port: 7860
7
  pinned: false
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: AdRL Studio
3
  colorFrom: purple
4
  colorTo: blue
5
  sdk: docker
6
  app_port: 7860
7
  pinned: false
8
  ---
9
+
10
+ <div align="center">
11
+
12
+ <h1>🎯 AdRL Studio</h1>
13
+ <img src="https://readme-typing-svg.demolab.com?font=Fira+Code&size=22&duration=3000&pause=1000&color=7C3AED&center=true&vCenter=true&width=700&lines=Contextual+Bandit+Ad+Recommendation+Engine;Benchmark+%CE%B5-Greedy%2C+UCB1%2C+Thompson%2C+LinUCB;Real-Time+Ad+Serving+%2B+Regret+Analysis" alt="Typing SVG"/>
14
+
15
+ <br/>
16
+
17
+ [![Python](https://img.shields.io/badge/Python-3.10+-3b82f6?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
18
+ [![Flask](https://img.shields.io/badge/Flask-2.x-4f46e5?style=for-the-badge&logo=flask&logoColor=white)](https://flask.palletsprojects.com/)
19
+ [![Docker](https://img.shields.io/badge/Docker-Ready-3b82f6?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
20
+ [![HuggingFace](https://img.shields.io/badge/HuggingFace-Spaces-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)
21
+ [![Status](https://img.shields.io/badge/Status-Active-22c55e?style=for-the-badge)](#)
22
+
23
+ <br/>
24
+
25
+ **🎯 AdRL Studio** β€” A contextual multi-armed bandit platform that simulates a real-world ad recommendation and serving system using reinforcement learning. Benchmarks four bandit algorithms side by side, visualizes online learning and regret curves, runs A/B test simulations with statistical significance testing, and serves real-time ad recommendations from user context input.
26
+
27
+ <br/>
28
+
29
+ ---
30
+
31
+ </div>
32
+
33
+ ## Table of Contents
34
+
35
+ - [Features](#-features)
36
+ - [Architecture](#️-architecture)
37
+ - [Getting Started](#-getting-started)
38
+ - [Docker Deployment](#-docker-deployment)
39
+ - [Dashboard Modules](#-dashboard-modules)
40
+ - [ML Models](#-ml-models)
41
+ - [Project Structure](#-project-structure)
42
+ - [Author](#-author)
43
+ - [Contributing](#-contributing)
44
+ - [Disclaimer](#disclaimer)
45
+ - [License](#-license)
46
+
47
+ ---
48
+
49
+ ## ✨ Features
50
+
51
+ <table>
52
+ <tr>
53
+ <td>🎯 <b>Live Ad Serving</b></td>
54
+ <td>Enter user context (age, device, time, category, region) and get real-time ad recommendations from all 4 algorithms simultaneously</td>
55
+ </tr>
56
+ <tr>
57
+ <td>β–Ά <b>Online Learning Simulation</b></td>
58
+ <td>Run 1K–10K impression simulations with SSE-streamed progress, rolling CTR charts, and per-algorithm summaries</td>
59
+ </tr>
60
+ <tr>
61
+ <td>πŸ“‰ <b>Regret Analysis</b></td>
62
+ <td>Visualize cumulative regret curves β€” the canonical RL evaluation metric β€” comparing all four policies</td>
63
+ </tr>
64
+ <tr>
65
+ <td>βš– <b>A/B Test Simulator</b></td>
66
+ <td>Run 50/50 traffic splits with two-proportion z-test, p-value, confidence intervals, and statistical significance verdict</td>
67
+ </tr>
68
+ <tr>
69
+ <td>πŸ”’ <b>Secure by Design</b></td>
70
+ <td>Role-based access, audit logs, encrypted data pipelines</td>
71
+ </tr>
72
+ <tr>
73
+ <td>🐳 <b>Containerized Deployment</b></td>
74
+ <td>Docker-first architecture, cloud-ready and scalable</td>
75
+ </tr>
76
+ </table>
77
+
78
+ ---
79
+
80
+ ## πŸ—οΈ Architecture
81
+
82
+ ```
83
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
84
+ β”‚ AdRL Studio β”‚
85
+ β”‚ β”‚
86
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
87
+ β”‚ β”‚ Simulated│───▢│ Bandit │───▢│ Flask API β”‚ β”‚
88
+ β”‚ β”‚ Ad Environβ”‚ β”‚ Algorithmsβ”‚ β”‚ Backend β”‚ β”‚
89
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
90
+ β”‚ β”‚ β”‚
91
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
92
+ β”‚ β”‚ Plotly Charts β”‚ β”‚
93
+ β”‚ β”‚ Dashboard β”‚ β”‚
94
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
95
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
96
+ ```
97
+
98
+ ---
99
+
100
+ ## πŸš€ Getting Started
101
+
102
+ ### Prerequisites
103
+
104
+ - Python 3.10+
105
+ - Docker & Docker Compose
106
+ - Git
107
+
108
+ ### Local Installation
109
+
110
+ ```bash
111
+ # 1. Clone the repository
112
+ git clone https://github.com/mnoorchenar/AdRL-Studio.git
113
+ cd AdRL-Studio
114
+
115
+ # 2. Create a virtual environment
116
+ python -m venv venv
117
+ source venv/bin/activate # Windows: venv\Scripts\activate
118
+
119
+ # 3. Install dependencies
120
+ pip install -r requirements.txt
121
+
122
+ # 4. Configure environment variables
123
+ cp .env.example .env
124
+ # Edit .env with your settings
125
+
126
+ # 5. Run the application
127
+ python app.py
128
+ ```
129
+
130
+ Open your browser at `http://localhost:7860` πŸŽ‰
131
+
132
+ ---
133
+
134
+ ## 🐳 Docker Deployment
135
+
136
+ ```bash
137
+ # Build and run with Docker Compose
138
+ docker compose up --build
139
+
140
+ # Or pull and run the pre-built image
141
+ docker pull mnoorchenar/AdRL-Studio
142
+ docker run -p 7860:7860 mnoorchenar/AdRL-Studio
143
+ ```
144
+
145
+ ---
146
+
147
+ ## πŸ“Š Dashboard Modules
148
+
149
+ | Module | Description | Status |
150
+ |--------|-------------|--------|
151
+ | 🎯 Live Ad Serving | Real-time 4-algorithm recommendation from user context | βœ… Live |
152
+ | β–Ά Online Learning | Simulation with SSE streaming and rolling CTR charts | βœ… Live |
153
+ | πŸ“‰ Regret Analysis | Cumulative regret curves for all four algorithms | βœ… Live |
154
+ | βš– A/B Test Simulator | Statistical significance testing with z-test & CI | βœ… Live |
155
+ | 🌑 Reward Landscape | 5Γ—5 CTR heatmap: user content category Γ— ad category | βœ… Live |
156
+ | πŸ”¬ Policy Inspector | Per-ad learned weights and posterior distributions | πŸ—“οΈ Planned |
157
+
158
+ ---
159
+
160
+ ## 🧠 ML Models
161
+
162
+ ```python
163
+ # Core Models Used in AdRL Studio
164
+ models = {
165
+ "epsilon_greedy": "Ξ΅-Greedy Neural Bandit β€” shared PyTorch MLP (39β†’32β†’16β†’1) with decaying Ξ΅",
166
+ "ucb1": "UCB1 β€” Upper Confidence Bound non-contextual baseline",
167
+ "thompson": "Thompson Sampling β€” Bayesian Beta(Ξ±,Ξ²) per arm",
168
+ "linucb": "LinUCB Disjoint β€” ridge regression contextual bandit (production-grade)",
169
+ "environment": "Simulated 20-ad inventory, 19-dim one-hot context, Bernoulli reward sampling"
170
+ }
171
+ ```
172
+
173
+ ---
174
+
175
+ ## πŸ“ Project Structure
176
+
177
+ ```
178
+ AdRL-Studio/
179
+ β”‚
180
+ β”œβ”€β”€ πŸ“„ app.py # Complete Flask application β€” all logic, templates, and API
181
+ β”œβ”€β”€ πŸ“„ Dockerfile # Container definition (python:3.10-slim, port 7860)
182
+ β”œβ”€β”€ πŸ“„ requirements.txt # Python dependencies
183
+ └── πŸ“„ README.md # This file
184
+ ```
185
+
186
+ > All application logic, HTML templates, CSS, and JavaScript live inside `app.py`
187
+ > using Flask's `render_template_string`. There are no external static files.
188
+
189
+ ---
190
+
191
+ ## πŸ‘¨β€πŸ’» Author
192
+
193
+ <div align="center">
194
+
195
+ <table>
196
+ <tr>
197
+ <td align="center" width="100%">
198
+
199
+ <img src="https://avatars.githubusercontent.com/mnoorchenar" width="120" style="border-radius:50%; border: 3px solid #4f46e5;" alt="Mohammad Noorchenarboo"/>
200
+
201
+ <h3>Mohammad Noorchenarboo</h3>
202
+
203
+ <code>Data Scientist</code> &nbsp;|&nbsp; <code>AI Researcher</code> &nbsp;|&nbsp; <code>Biostatistician</code>
204
+
205
+ πŸ“ &nbsp;Ontario, Canada &nbsp;&nbsp; πŸ“§ &nbsp;[mohammadnoorchenarboo@gmail.com](mailto:mohammadnoorchenarboo@gmail.com)
206
+
207
+ ──────────────────────────────────────
208
+
209
+ [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mnoorchenar)&nbsp;
210
+ [![Personal Site](https://img.shields.io/badge/Website-mnoorchenar.github.io-4f46e5?style=for-the-badge&logo=githubpages&logoColor=white)](https://mnoorchenar.github.io/)&nbsp;
211
+ [![HuggingFace](https://img.shields.io/badge/HuggingFace-ffcc00?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/mnoorchenar/spaces)&nbsp;
212
+ [![Google Scholar](https://img.shields.io/badge/Scholar-4285F4?style=for-the-badge&logo=googlescholar&logoColor=white)](https://scholar.google.ca/citations?user=nn_Toq0AAAAJ&hl=en)&nbsp;
213
+ [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mnoorchenar)
214
+
215
+ </td>
216
+ </tr>
217
+ </table>
218
+
219
+ </div>
220
+
221
+ ---
222
+
223
+ ## 🀝 Contributing
224
+
225
+ Contributions are welcome! Please follow these steps:
226
+
227
+ 1. **Fork** the repository
228
+ 2. **Create** a feature branch: `git checkout -b feature/amazing-feature`
229
+ 3. **Commit** your changes: `git commit -m 'Add amazing feature'`
230
+ 4. **Push** to the branch: `git push origin feature/amazing-feature`
231
+ 5. **Open** a Pull Request
232
+
233
+ ---
234
+
235
+ ## Disclaimer
236
+
237
+ <span style="color:red">This project is developed strictly for educational and research purposes and does not constitute professional advice of any kind. All datasets used are either synthetically generated or publicly available β€” no real user data is stored. This software is provided "as is" without warranty of any kind; use at your own risk.</span>
238
+
239
+ ---
240
+
241
+ ## πŸ“œ License
242
+
243
+ Distributed under the **MIT License**. See [`LICENSE`](LICENSE) for more information.
244
+
245
+ ---
246
+
247
+ <div align="center">
248
+
249
+ <img src="https://capsule-render.vercel.app/api?type=waving&color=0:3b82f6,100:4f46e5&height=120&section=footer&text=Made%20with%20%E2%9D%A4%EF%B8%8F%20by%20Mohammad%20Noorchenarboo&fontColor=ffffff&fontSize=18&fontAlignY=80" width="100%"/>
250
+
251
+ [![GitHub Stars](https://img.shields.io/github/stars/mnoorchenar/AdRL-Studio?style=social)](https://github.com/mnoorchenar/AdRL-Studio)
252
+ [![GitHub Forks](https://img.shields.io/github/forks/mnoorchenar/AdRL-Studio?style=social)](https://github.com/mnoorchenar/AdRL-Studio/fork)
253
+
254
+ <sub>The name "AdRL Studio" is used purely for academic and research purposes. Any similarity to existing company names, products, or trademarks is entirely coincidental and unintentional. This project has no affiliation with any commercial entity.</sub>
255
+
256
+ </div>
app.py CHANGED
@@ -1,12 +1,1091 @@
1
- ο»Ώfrom flask import Flask, render_template_string
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  app = Flask(__name__)
3
- HTML = """<!DOCTYPE html>
4
- <html><head><title>AdRL-Studio</title></head>
5
- <body style="font-family:Arial;max-width:800px;margin:50px auto;padding:20px">
6
- <h1>AdRL-Studio</h1>
7
- <p>Running on port 7860.</p>
8
- <span style="background:#28a745;color:#fff;padding:5px 15px;border-radius:15px">Running</span>
9
- </body></html>"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  @app.route('/')
11
- def home(): return render_template_string(HTML)
12
- if __name__ == '__main__': app.run(host='0.0.0.0', port=7860)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AdRL Studio β€” Contextual Bandit Ad Recommendation Engine
3
+
4
+ This application implements and benchmarks four reinforcement learning
5
+ contextual bandit algorithms for ad recommendation: (1) Ξ΅-Greedy Neural
6
+ Bandit using a shared PyTorch MLP, (2) UCB1 (Upper Confidence Bound),
7
+ a non-contextual baseline, (3) Thompson Sampling with Beta distribution
8
+ priors, and (4) LinUCB Disjoint Model, the industry-standard contextual
9
+ bandit used in production ad systems. The simulated environment features
10
+ 20 ads across 5 categories and 5 user context features (age group, device,
11
+ time of day, content category, region) encoded as a 19-dimensional one-hot
12
+ vector. True click-through rates are determined by hidden weight vectors
13
+ initialized at startup (seed=42). Algorithms observe only bandit feedback
14
+ β€” the reward for the chosen arm only β€” and must balance exploration
15
+ vs. exploitation to minimize cumulative regret.
16
+ """
17
+
18
+ import json
19
+ import math
20
+ import threading
21
+ import numpy as np
22
+ import torch
23
+ import torch.nn as nn
24
+ import torch.optim as optim
25
+ from flask import Flask, Response, jsonify, render_template_string, request
26
+ from scipy import stats
27
+
28
  app = Flask(__name__)
29
+
30
+ # ─────────────────────────────────────────────────────────────────────────────
31
+ # Environment constants
32
+ # ─────────────────────────────────────────────────────────────────────────────
33
+ np.random.seed(42)
34
+
35
+ AGE_GROUPS = ["young_adult", "adult", "senior"]
36
+ DEVICES = ["mobile", "desktop", "tablet"]
37
+ TIMES_OF_DAY = ["morning", "afternoon", "evening", "night"]
38
+ CONTENT_CATS = ["tech", "sports", "lifestyle", "news", "entertainment"]
39
+ REGIONS = ["north_america", "europe", "asia", "other"]
40
+ CONTEXT_DIM = len(AGE_GROUPS) + len(DEVICES) + len(TIMES_OF_DAY) + len(CONTENT_CATS) + len(REGIONS) # 19
41
+
42
+ N_ADS = 20
43
+ AD_IDS = [f"ad_{i:02d}" for i in range(1, 21)]
44
+ AD_CATEGORIES = {ad: cat for cat in ["Tech","Fashion","Finance","Food","Travel"]
45
+ for ad in [f"ad_{i:02d}" for i in range(AD_IDS.index(
46
+ [a for a in AD_IDS if True][["Tech","Fashion","Finance","Food","Travel"].index(cat)*4])+1,
47
+ ["Tech","Fashion","Finance","Food","Travel"].index(cat)*4+5)]}
48
+
49
+ # Rebuild clean category mapping
50
+ AD_CAT_MAP = {}
51
+ for i, ad in enumerate(AD_IDS):
52
+ cats = ["Tech","Fashion","Finance","Food","Travel"]
53
+ AD_CAT_MAP[ad] = cats[i // 4]
54
+
55
+ AD_FORMATS = {
56
+ "ad_01":"banner","ad_02":"video","ad_03":"native","ad_04":"banner",
57
+ "ad_05":"banner","ad_06":"video","ad_07":"banner","ad_08":"native",
58
+ "ad_09":"native","ad_10":"banner","ad_11":"video","ad_12":"native",
59
+ "ad_13":"banner","ad_14":"native","ad_15":"banner","ad_16":"video",
60
+ "ad_17":"video","ad_18":"banner","ad_19":"native","ad_20":"video",
61
+ }
62
+ AD_BIDS = {
63
+ "ad_01":2.50,"ad_02":3.00,"ad_03":3.50,"ad_04":4.00,
64
+ "ad_05":1.50,"ad_06":2.00,"ad_07":2.50,"ad_08":3.00,
65
+ "ad_09":3.00,"ad_10":3.50,"ad_11":4.00,"ad_12":5.00,
66
+ "ad_13":1.00,"ad_14":1.50,"ad_15":2.00,"ad_16":2.50,
67
+ "ad_17":2.00,"ad_18":2.50,"ad_19":3.00,"ad_20":3.50,
68
+ }
69
+
70
+ # Hidden true CTR weights β€” fixed at startup, never exposed to algorithms
71
+ _TRUE_WEIGHTS = np.random.randn(N_ADS, CONTEXT_DIM) * 0.3
72
+
73
+ def _sigmoid(x):
74
+ return 1.0 / (1.0 + np.exp(-np.clip(x, -20, 20)))
75
+
76
+ def true_ctr(ad_idx, ctx):
77
+ return float(np.clip(_sigmoid(ctx @ _TRUE_WEIGHTS[ad_idx]), 0.02, 0.25))
78
+
79
+ def encode_context(age, device, tod, content, region):
80
+ vec = np.zeros(CONTEXT_DIM, dtype=np.float32)
81
+ offset = 0
82
+ vec[offset + AGE_GROUPS.index(age)] = 1.0; offset += len(AGE_GROUPS)
83
+ vec[offset + DEVICES.index(device)] = 1.0; offset += len(DEVICES)
84
+ vec[offset + TIMES_OF_DAY.index(tod)] = 1.0; offset += len(TIMES_OF_DAY)
85
+ vec[offset + CONTENT_CATS.index(content)] = 1.0; offset += len(CONTENT_CATS)
86
+ vec[offset + REGIONS.index(region)] = 1.0
87
+ return vec
88
+
89
+ def sample_random_context():
90
+ return encode_context(
91
+ np.random.choice(AGE_GROUPS), np.random.choice(DEVICES),
92
+ np.random.choice(TIMES_OF_DAY), np.random.choice(CONTENT_CATS),
93
+ np.random.choice(REGIONS),
94
+ )
95
+
96
+ # ─────────────────────────────────────────────────────────────────────────────
97
+ # Algorithm classes
98
+ # ─────────────────────────────────────────────────────────────────────────────
99
+
100
+ class EpsilonGreedyNeuralBandit:
101
+ NAME = "Ξ΅-Greedy"
102
+ COLOR = "#f59e0b"
103
+
104
+ def __init__(self, epsilon=0.15, epsilon_min=0.01, decay=0.995, lr=0.01):
105
+ self.epsilon_0 = epsilon
106
+ self.epsilon_min = epsilon_min
107
+ self.decay = decay
108
+ self.lr = lr
109
+ self.reset()
110
+
111
+ def reset(self):
112
+ self.t = 0
113
+ self.n_updates = 0
114
+ self.model = nn.Sequential(
115
+ nn.Linear(CONTEXT_DIM + N_ADS, 32), nn.ReLU(),
116
+ nn.Linear(32, 16), nn.ReLU(),
117
+ nn.Linear(16, 1), nn.Sigmoid(),
118
+ )
119
+ self.optimizer = optim.SGD(self.model.parameters(), lr=self.lr)
120
+ self.criterion = nn.MSELoss()
121
+
122
+ def _inp(self, ctx, ad_idx):
123
+ oh = np.zeros(N_ADS, dtype=np.float32); oh[ad_idx] = 1.0
124
+ return torch.FloatTensor(np.concatenate([ctx, oh]))
125
+
126
+ def _pred(self, ctx, ad_idx):
127
+ self.model.eval()
128
+ with torch.no_grad():
129
+ return self.model(self._inp(ctx, ad_idx)).item()
130
+
131
+ def select(self, ctx):
132
+ eps = max(self.epsilon_min, self.epsilon_0 * (self.decay ** self.t))
133
+ if np.random.rand() < eps:
134
+ return int(np.random.randint(N_ADS))
135
+ return int(np.argmax([self._pred(ctx, a) for a in range(N_ADS)]))
136
+
137
+ def predict_ctr(self, ctx, ad_idx):
138
+ return self._pred(ctx, ad_idx)
139
+
140
+ def update(self, ctx, action, reward):
141
+ self.model.train()
142
+ x = self._inp(ctx, action).unsqueeze(0)
143
+ y = torch.FloatTensor([[float(reward)]])
144
+ self.optimizer.zero_grad()
145
+ self.criterion(self.model(x), y).backward()
146
+ self.optimizer.step()
147
+ self.t += 1
148
+ self.n_updates += 1
149
+
150
+
151
+ class UCB1Bandit:
152
+ NAME = "UCB1"
153
+ COLOR = "#10b981"
154
+
155
+ def __init__(self):
156
+ self.reset()
157
+
158
+ def reset(self):
159
+ self.n_a = np.zeros(N_ADS)
160
+ self.R_a = np.zeros(N_ADS)
161
+ self.t = 0
162
+ self._init_idx = 0
163
+ self.n_updates = 0
164
+
165
+ def select(self, ctx):
166
+ if self._init_idx < N_ADS:
167
+ return self._init_idx
168
+ mu = self.R_a / np.maximum(self.n_a, 1)
169
+ bonus = np.sqrt(2.0 * np.log(max(self.t, 1)) / np.maximum(self.n_a, 1))
170
+ return int(np.argmax(mu + bonus))
171
+
172
+ def predict_ctr(self, ctx, ad_idx):
173
+ if self.n_a[ad_idx] == 0:
174
+ return 0.0
175
+ return float(self.R_a[ad_idx] / self.n_a[ad_idx])
176
+
177
+ def update(self, ctx, action, reward):
178
+ if self._init_idx < N_ADS:
179
+ self._init_idx += 1
180
+ self.n_a[action] += 1
181
+ self.R_a[action] += reward
182
+ self.t += 1
183
+ self.n_updates += 1
184
+
185
+
186
+ class ThompsonSamplingBandit:
187
+ NAME = "Thompson"
188
+ COLOR = "#3b82f6"
189
+
190
+ def __init__(self):
191
+ self.reset()
192
+
193
+ def reset(self):
194
+ self.alpha = np.ones(N_ADS)
195
+ self.beta_p = np.ones(N_ADS)
196
+ self.n_updates = 0
197
+
198
+ def select(self, ctx):
199
+ return int(np.argmax(np.random.beta(self.alpha, self.beta_p)))
200
+
201
+ def predict_ctr(self, ctx, ad_idx):
202
+ return float(self.alpha[ad_idx] / (self.alpha[ad_idx] + self.beta_p[ad_idx]))
203
+
204
+ def update(self, ctx, action, reward):
205
+ if reward == 1:
206
+ self.alpha[action] += 1
207
+ else:
208
+ self.beta_p[action] += 1
209
+ self.n_updates += 1
210
+
211
+
212
+ class LinUCBBandit:
213
+ NAME = "LinUCB"
214
+ COLOR = "#ef4444"
215
+
216
+ def __init__(self, alpha=1.0):
217
+ self.alpha = alpha
218
+ self.reset()
219
+
220
+ def reset(self):
221
+ d = CONTEXT_DIM
222
+ self.A = [np.identity(d) for _ in range(N_ADS)]
223
+ self.b = [np.zeros(d) for _ in range(N_ADS)]
224
+ self.n_updates = 0
225
+
226
+ def _ucb_score(self, ctx, ad_idx):
227
+ A_inv = np.linalg.inv(self.A[ad_idx])
228
+ theta = A_inv @ self.b[ad_idx]
229
+ x = ctx
230
+ return float(theta @ x + self.alpha * math.sqrt(max(float(x @ A_inv @ x), 0.0)))
231
+
232
+ def select(self, ctx):
233
+ return int(np.argmax([self._ucb_score(ctx, a) for a in range(N_ADS)]))
234
+
235
+ def predict_ctr(self, ctx, ad_idx):
236
+ A_inv = np.linalg.inv(self.A[ad_idx])
237
+ return float((A_inv @ self.b[ad_idx]) @ ctx)
238
+
239
+ def update(self, ctx, action, reward):
240
+ x = ctx
241
+ self.A[action] += np.outer(x, x)
242
+ self.b[action] += reward * x
243
+ self.n_updates += 1
244
+
245
+
246
+ # ─────────────────────────────────────────────────────────────────────────────
247
+ # Global state
248
+ # ─────────────────────────────────────────────────────────────────────────────
249
+ ALGO_KEYS = ["epsilon_greedy", "ucb1", "thompson", "linucb"]
250
+ ALGO_CLASSES = {
251
+ "epsilon_greedy": EpsilonGreedyNeuralBandit,
252
+ "ucb1": UCB1Bandit,
253
+ "thompson": ThompsonSamplingBandit,
254
+ "linucb": LinUCBBandit,
255
+ }
256
+ ALGO_DISPLAY = {
257
+ "epsilon_greedy": "Ξ΅-Greedy", "ucb1": "UCB1",
258
+ "thompson": "Thompson", "linucb": "LinUCB",
259
+ }
260
+ ALGO_COLORS = {
261
+ "epsilon_greedy": "#f59e0b", "ucb1": "#10b981",
262
+ "thompson": "#3b82f6", "linucb": "#ef4444",
263
+ }
264
+
265
+ algorithms = {k: cls() for k, cls in ALGO_CLASSES.items()}
266
+
267
+ sim_lock = threading.Lock()
268
+ sim_state = {"running": False, "step": 0, "total": 0, "last_results": None}
269
+
270
+ # ─────────────────────────────────────────────────────────────────────────────
271
+ # HTML Template
272
+ # ─────────────────────────────────────────────────────────────────────────────
273
+ TEMPLATE = """<!DOCTYPE html>
274
+ <html lang="en">
275
+ <head>
276
+ <meta charset="UTF-8"/>
277
+ <meta name="viewport" content="width=device-width,initial-scale=1.0"/>
278
+ <title>AdRL Studio</title>
279
+ <script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
280
+ <style>
281
+ *{margin:0;padding:0;box-sizing:border-box;}
282
+ body{font-family:'Segoe UI',sans-serif;background:#0f0f1a;color:#e2e8f0;display:flex;height:100vh;overflow:hidden;}
283
+ /* Sidebar */
284
+ #sidebar{width:240px;min-width:240px;background:#1a1a2e;display:flex;flex-direction:column;padding:0;border-right:1px solid #2d2d4e;}
285
+ #sidebar-header{padding:24px 20px 16px;border-bottom:1px solid #2d2d4e;}
286
+ #sidebar-header h1{font-size:1.2rem;font-weight:700;color:#fff;letter-spacing:.5px;}
287
+ #sidebar-header p{font-size:.72rem;color:#7c3aed;margin-top:4px;}
288
+ #nav{padding:12px 0;flex:1;}
289
+ .nav-item{display:flex;align-items:center;gap:10px;padding:11px 20px;cursor:pointer;color:#94a3b8;font-size:.85rem;transition:all .2s;border-left:3px solid transparent;}
290
+ .nav-item:hover{background:#252545;color:#e2e8f0;}
291
+ .nav-item.active{background:#1e1b4b;color:#a78bfa;border-left:3px solid #7c3aed;}
292
+ .nav-icon{font-size:1.1rem;width:20px;text-align:center;}
293
+ /* Main */
294
+ #main{flex:1;display:flex;flex-direction:column;overflow:hidden;}
295
+ #topbar{height:52px;background:#1a1a2e;border-bottom:1px solid #2d2d4e;display:flex;align-items:center;padding:0 24px;gap:12px;}
296
+ #topbar-title{font-size:1rem;font-weight:600;color:#fff;}
297
+ #status-dot{width:10px;height:10px;border-radius:50%;background:#22c55e;margin-left:auto;}
298
+ #status-dot.running{background:#eab308;animation:pulse 1s infinite;}
299
+ #status-label{font-size:.78rem;color:#94a3b8;}
300
+ @keyframes pulse{0%,100%{opacity:1;}50%{opacity:.4;}}
301
+ #content{flex:1;overflow-y:auto;padding:24px;}
302
+ /* Cards */
303
+ .card{background:#16213e;border-radius:10px;padding:20px;margin-bottom:18px;border:1px solid #2d2d4e;}
304
+ .card-title{font-size:.9rem;font-weight:600;color:#a78bfa;margin-bottom:14px;text-transform:uppercase;letter-spacing:.8px;}
305
+ /* Grid */
306
+ .grid-2{display:grid;grid-template-columns:1fr 1fr;gap:16px;}
307
+ .grid-4{display:grid;grid-template-columns:repeat(4,1fr);gap:14px;}
308
+ /* Form controls */
309
+ .form-row{display:flex;gap:14px;flex-wrap:wrap;align-items:flex-end;margin-bottom:16px;}
310
+ .form-group{display:flex;flex-direction:column;gap:5px;min-width:150px;}
311
+ label{font-size:.78rem;color:#94a3b8;font-weight:500;}
312
+ select,input[type=range]{background:#0f0f1a;border:1px solid #2d2d4e;color:#e2e8f0;border-radius:6px;padding:7px 10px;font-size:.82rem;outline:none;}
313
+ select:focus{border-color:#7c3aed;}
314
+ input[type=range]{padding:0;height:4px;accent-color:#7c3aed;width:100%;}
315
+ .range-row{display:flex;justify-content:space-between;font-size:.75rem;color:#64748b;margin-top:2px;}
316
+ /* Buttons */
317
+ .btn{background:#7c3aed;color:#fff;border:none;border-radius:7px;padding:9px 20px;font-size:.85rem;font-weight:600;cursor:pointer;transition:background .2s;}
318
+ .btn:hover{background:#6d28d9;}
319
+ .btn:disabled{background:#374151;cursor:not-allowed;}
320
+ /* Algo cards */
321
+ .algo-card{background:#0f0f1a;border-radius:8px;padding:14px;border:1px solid #2d2d4e;}
322
+ .algo-name{font-size:.8rem;font-weight:700;margin-bottom:6px;}
323
+ .algo-ad{font-size:1.05rem;font-weight:600;color:#fff;margin-bottom:2px;}
324
+ .algo-meta{font-size:.75rem;color:#94a3b8;}
325
+ .algo-score{font-size:.8rem;margin-top:6px;}
326
+ /* Table */
327
+ table{width:100%;border-collapse:collapse;font-size:.82rem;}
328
+ th{background:#0f0f1a;color:#94a3b8;padding:8px 12px;text-align:left;font-weight:600;border-bottom:1px solid #2d2d4e;}
329
+ td{padding:8px 12px;border-bottom:1px solid #1e293b;color:#e2e8f0;}
330
+ tr:last-child td{border-bottom:none;}
331
+ /* Progress bar */
332
+ .progress-bar{background:#1e293b;border-radius:4px;height:8px;overflow:hidden;margin:10px 0;}
333
+ .progress-fill{height:100%;background:#7c3aed;transition:width .3s;border-radius:4px;}
334
+ /* Tabs hidden by default */
335
+ .tab-pane{display:none;}
336
+ .tab-pane.active{display:block;}
337
+ /* Stat box */
338
+ .stat-box{background:#0f0f1a;border-radius:8px;padding:12px;text-align:center;}
339
+ .stat-val{font-size:1.4rem;font-weight:700;color:#a78bfa;}
340
+ .stat-lbl{font-size:.72rem;color:#64748b;margin-top:2px;}
341
+ /* Verdict */
342
+ .verdict-sig{color:#22c55e;font-weight:700;}
343
+ .verdict-ns{color:#ef4444;font-weight:700;}
344
+ /* Lift row */
345
+ .lift-row{display:flex;gap:12px;flex-wrap:wrap;margin-bottom:16px;}
346
+ .lift-box{flex:1;min-width:120px;background:#0f0f1a;border-radius:8px;padding:12px;text-align:center;}
347
+ </style>
348
+ </head>
349
+ <body>
350
+ <div id="sidebar">
351
+ <div id="sidebar-header">
352
+ <h1>&#127916; AdRL Studio</h1>
353
+ <p>Contextual Bandit Ad Engine</p>
354
+ </div>
355
+ <nav id="nav">
356
+ <div class="nav-item active" onclick="showTab('live')" id="nav-live">
357
+ <span class="nav-icon">&#127919;</span><span>Live Ad Serving</span>
358
+ </div>
359
+ <div class="nav-item" onclick="showTab('simulation')" id="nav-simulation">
360
+ <span class="nav-icon">&#9654;</span><span>Online Learning</span>
361
+ </div>
362
+ <div class="nav-item" onclick="showTab('regret')" id="nav-regret">
363
+ <span class="nav-icon">&#128200;</span><span>Regret Analysis</span>
364
+ </div>
365
+ <div class="nav-item" onclick="showTab('abtest')" id="nav-abtest">
366
+ <span class="nav-icon">&#9878;</span><span>A/B Test Simulator</span>
367
+ </div>
368
+ <div class="nav-item" onclick="showTab('heatmap')" id="nav-heatmap">
369
+ <span class="nav-icon">&#127777;</span><span>Reward Landscape</span>
370
+ </div>
371
+ </nav>
372
+ </div>
373
+
374
+ <div id="main">
375
+ <div id="topbar">
376
+ <span id="topbar-title">Live Ad Serving</span>
377
+ <div id="status-dot"></div>
378
+ <span id="status-label">Model Ready</span>
379
+ </div>
380
+
381
+ <div id="content">
382
+
383
+ <!-- TAB 1: Live Ad Serving -->
384
+ <div class="tab-pane active" id="tab-live">
385
+ <div class="card">
386
+ <div class="card-title">&#127891; User Context</div>
387
+ <div class="form-row">
388
+ <div class="form-group">
389
+ <label>Age Group</label>
390
+ <select id="ctx-age">
391
+ <option value="young_adult">Young Adult (18–34)</option>
392
+ <option value="adult" selected>Adult (35–54)</option>
393
+ <option value="senior">Senior (55+)</option>
394
+ </select>
395
+ </div>
396
+ <div class="form-group">
397
+ <label>Device</label>
398
+ <select id="ctx-device">
399
+ <option value="mobile">Mobile</option>
400
+ <option value="desktop" selected>Desktop</option>
401
+ <option value="tablet">Tablet</option>
402
+ </select>
403
+ </div>
404
+ <div class="form-group">
405
+ <label>Time of Day</label>
406
+ <select id="ctx-tod">
407
+ <option value="morning">Morning (6–12)</option>
408
+ <option value="afternoon" selected>Afternoon (12–18)</option>
409
+ <option value="evening">Evening (18–24)</option>
410
+ <option value="night">Night (0–6)</option>
411
+ </select>
412
+ </div>
413
+ <div class="form-group">
414
+ <label>Content Category</label>
415
+ <select id="ctx-content">
416
+ <option value="tech" selected>Tech</option>
417
+ <option value="sports">Sports</option>
418
+ <option value="lifestyle">Lifestyle</option>
419
+ <option value="news">News</option>
420
+ <option value="entertainment">Entertainment</option>
421
+ </select>
422
+ </div>
423
+ <div class="form-group">
424
+ <label>Region</label>
425
+ <select id="ctx-region">
426
+ <option value="north_america" selected>North America</option>
427
+ <option value="europe">Europe</option>
428
+ <option value="asia">Asia</option>
429
+ <option value="other">Other</option>
430
+ </select>
431
+ </div>
432
+ <div class="form-group" style="justify-content:flex-end;">
433
+ <button class="btn" onclick="getRecommendations()">&#128269; Get Recommendations</button>
434
+ </div>
435
+ </div>
436
+ </div>
437
+ <div class="card">
438
+ <div class="card-title">&#127917; Algorithm Recommendations</div>
439
+ <div class="grid-4" id="rec-grid">
440
+ <div class="algo-card"><div class="algo-name" style="color:#f59e0b">Ξ΅-Greedy</div><div class="algo-ad" id="r-eg-ad">β€”</div><div class="algo-meta" id="r-eg-meta">β€”</div><div class="algo-score" id="r-eg-score">β€”</div></div>
441
+ <div class="algo-card"><div class="algo-name" style="color:#10b981">UCB1</div><div class="algo-ad" id="r-ucb-ad">β€”</div><div class="algo-meta" id="r-ucb-meta">β€”</div><div class="algo-score" id="r-ucb-score">β€”</div></div>
442
+ <div class="algo-card"><div class="algo-name" style="color:#3b82f6">Thompson</div><div class="algo-ad" id="r-ts-ad">β€”</div><div class="algo-meta" id="r-ts-meta">β€”</div><div class="algo-score" id="r-ts-score">β€”</div></div>
443
+ <div class="algo-card"><div class="algo-name" style="color:#ef4444">LinUCB</div><div class="algo-ad" id="r-lu-ad">β€”</div><div class="algo-meta" id="r-lu-meta">β€”</div><div class="algo-score" id="r-lu-score">β€”</div></div>
444
+ </div>
445
+ </div>
446
+ </div>
447
+
448
+ <!-- TAB 2: Online Learning Simulation -->
449
+ <div class="tab-pane" id="tab-simulation">
450
+ <div class="card">
451
+ <div class="card-title">&#9881; Simulation Settings</div>
452
+ <div class="form-row">
453
+ <div class="form-group" style="flex:1;max-width:300px;">
454
+ <label>Impressions: <span id="n-val">3000</span></label>
455
+ <input type="range" id="n-impressions" min="1000" max="10000" step="500" value="3000"
456
+ oninput="document.getElementById('n-val').textContent=this.value"/>
457
+ <div class="range-row"><span>1,000</span><span>10,000</span></div>
458
+ </div>
459
+ <div class="form-group" style="flex:1;max-width:300px;">
460
+ <label>Random Seed: <span id="seed-val">42</span></label>
461
+ <input type="range" id="sim-seed" min="1" max="100" step="1" value="42"
462
+ oninput="document.getElementById('seed-val').textContent=this.value"/>
463
+ <div class="range-row"><span>1</span><span>100</span></div>
464
+ </div>
465
+ <div class="form-group" style="justify-content:flex-end;">
466
+ <button class="btn" id="run-sim-btn" onclick="runSimulation()">&#9654; Run Simulation</button>
467
+ </div>
468
+ </div>
469
+ <div class="progress-bar" id="sim-progress-bar" style="display:none;">
470
+ <div class="progress-fill" id="sim-progress-fill" style="width:0%;"></div>
471
+ </div>
472
+ <div id="sim-progress-text" style="font-size:.78rem;color:#94a3b8;"></div>
473
+ </div>
474
+ <div class="card">
475
+ <div class="card-title">&#128200; Rolling CTR (100-impression window)</div>
476
+ <div id="sim-chart" style="height:320px;"></div>
477
+ </div>
478
+ <div class="card">
479
+ <div class="card-title">&#128202; Simulation Summary</div>
480
+ <div id="sim-table-container"><p style="color:#64748b;font-size:.82rem;">Run a simulation to see results.</p></div>
481
+ </div>
482
+ </div>
483
+
484
+ <!-- TAB 3: Regret Analysis -->
485
+ <div class="tab-pane" id="tab-regret">
486
+ <div class="card">
487
+ <div class="card-title">&#128201; Cumulative Regret Comparison</div>
488
+ <p style="font-size:.78rem;color:#64748b;margin-bottom:12px;">
489
+ Cumulative regret measures the total reward missed vs. always picking the oracle best arm.
490
+ Lower is better. LinUCB and Thompson typically achieve sub-linear regret.
491
+ </p>
492
+ <div id="regret-chart" style="height:340px;"></div>
493
+ </div>
494
+ <div class="card">
495
+ <div class="card-title">&#128203; Regret Summary</div>
496
+ <div id="regret-table-container"><p style="color:#64748b;font-size:.82rem;">Run a simulation first (Online Learning tab).</p></div>
497
+ </div>
498
+ <div style="text-align:right;margin-top:-8px;">
499
+ <button class="btn" onclick="loadRegret()" style="font-size:.78rem;padding:7px 14px;">&#8635; Refresh Regret Data</button>
500
+ </div>
501
+ </div>
502
+
503
+ <!-- TAB 4: A/B Test Simulator -->
504
+ <div class="tab-pane" id="tab-abtest">
505
+ <div class="card">
506
+ <div class="card-title">&#9878; A/B Test Settings</div>
507
+ <div class="form-row">
508
+ <div class="form-group">
509
+ <label>Policy A</label>
510
+ <select id="ab-policy-a">
511
+ <option value="linucb" selected>LinUCB</option>
512
+ <option value="epsilon_greedy">Ξ΅-Greedy</option>
513
+ <option value="ucb1">UCB1</option>
514
+ <option value="thompson">Thompson</option>
515
+ </select>
516
+ </div>
517
+ <div class="form-group">
518
+ <label>Policy B</label>
519
+ <select id="ab-policy-b">
520
+ <option value="ucb1" selected>UCB1</option>
521
+ <option value="epsilon_greedy">Ξ΅-Greedy</option>
522
+ <option value="thompson">Thompson</option>
523
+ <option value="linucb">LinUCB</option>
524
+ </select>
525
+ </div>
526
+ <div class="form-group" style="flex:1;max-width:280px;">
527
+ <label>Impressions: <span id="ab-n-val">5000</span></label>
528
+ <input type="range" id="ab-impressions" min="1000" max="20000" step="1000" value="5000"
529
+ oninput="document.getElementById('ab-n-val').textContent=this.value"/>
530
+ <div class="range-row"><span>1,000</span><span>20,000</span></div>
531
+ </div>
532
+ <div class="form-group" style="justify-content:flex-end;">
533
+ <button class="btn" id="run-ab-btn" onclick="runABTest()">&#9878; Run A/B Test</button>
534
+ </div>
535
+ </div>
536
+ </div>
537
+ <div id="ab-results" style="display:none;">
538
+ <div class="card">
539
+ <div class="card-title">&#128202; A/B Test Results</div>
540
+ <div class="lift-row">
541
+ <div class="lift-box"><div class="stat-val" id="ab-ctr-a">β€”</div><div class="stat-lbl" id="ab-lbl-a">Policy A CTR</div></div>
542
+ <div class="lift-box"><div class="stat-val" id="ab-ctr-b">β€”</div><div class="stat-lbl" id="ab-lbl-b">Policy B CTR</div></div>
543
+ <div class="lift-box"><div class="stat-val" id="ab-lift">β€”</div><div class="stat-lbl">Absolute Lift</div></div>
544
+ <div class="lift-box"><div class="stat-val" id="ab-lift-rel">β€”</div><div class="stat-lbl">Relative Lift</div></div>
545
+ </div>
546
+ <div class="lift-row">
547
+ <div class="lift-box"><div class="stat-val" id="ab-z">β€”</div><div class="stat-lbl">Z-Statistic</div></div>
548
+ <div class="lift-box"><div class="stat-val" id="ab-p">β€”</div><div class="stat-lbl">P-Value</div></div>
549
+ <div class="lift-box"><div class="stat-val" id="ab-ci">β€”</div><div class="stat-lbl">95% CI (Lift)</div></div>
550
+ <div class="lift-box" style="flex:2;"><div class="stat-val" id="ab-verdict">β€”</div><div class="stat-lbl">Verdict</div></div>
551
+ </div>
552
+ <div id="ab-chart" style="height:280px;margin-top:8px;"></div>
553
+ </div>
554
+ </div>
555
+ </div>
556
+
557
+ <!-- TAB 5: Reward Landscape -->
558
+ <div class="tab-pane" id="tab-heatmap">
559
+ <div class="card">
560
+ <div class="card-title">&#127777; Reward Landscape Settings</div>
561
+ <div class="form-row">
562
+ <div class="form-group">
563
+ <label>Algorithm</label>
564
+ <select id="hm-algo">
565
+ <option value="linucb" selected>LinUCB</option>
566
+ <option value="epsilon_greedy">Ξ΅-Greedy</option>
567
+ <option value="ucb1">UCB1</option>
568
+ <option value="thompson">Thompson</option>
569
+ </select>
570
+ </div>
571
+ <div class="form-group" style="justify-content:flex-end;">
572
+ <button class="btn" onclick="loadHeatmap()">&#8635; Refresh Heatmap</button>
573
+ </div>
574
+ </div>
575
+ <p style="font-size:.76rem;color:#64748b;">Estimated CTR for each user content category Γ— ad category pair. Context held at: adult, desktop, afternoon, north_america.</p>
576
+ </div>
577
+ <div class="card">
578
+ <div class="card-title">&#128200; Estimated CTR Heatmap</div>
579
+ <div id="heatmap-chart" style="height:380px;"></div>
580
+ </div>
581
+ </div>
582
+
583
+ </div><!-- /content -->
584
+ </div><!-- /main -->
585
+
586
+ <script>
587
+ // ── Tab switching ────────────────────────────────────────────────────────────
588
+ const TAB_TITLES = {
589
+ live:'Live Ad Serving', simulation:'Online Learning Simulation',
590
+ regret:'Regret Analysis', abtest:'A/B Test Simulator', heatmap:'Reward Landscape'
591
+ };
592
+ function showTab(name) {
593
+ document.querySelectorAll('.tab-pane').forEach(p => p.classList.remove('active'));
594
+ document.querySelectorAll('.nav-item').forEach(n => n.classList.remove('active'));
595
+ document.getElementById('tab-' + name).classList.add('active');
596
+ document.getElementById('nav-' + name).classList.add('active');
597
+ document.getElementById('topbar-title').textContent = TAB_TITLES[name];
598
+ }
599
+
600
+ // ── Status polling ───────────────────────────────────────────────────────────
601
+ function pollStatus() {
602
+ fetch('/api/status').then(r => r.json()).then(d => {
603
+ const dot = document.getElementById('status-dot');
604
+ const lbl = document.getElementById('status-label');
605
+ if (d.running) {
606
+ dot.className = 'running'; dot.style.background = '#eab308';
607
+ lbl.textContent = 'Simulation Running (' + d.step + '/' + d.total + ')';
608
+ } else {
609
+ dot.className = ''; dot.style.background = '#22c55e';
610
+ lbl.textContent = 'Model Ready';
611
+ }
612
+ }).catch(() => {});
613
+ }
614
+ setInterval(pollStatus, 2000);
615
+
616
+ // ── Tab 1: Recommendations ───────────────────────────────────────────────────
617
+ async function getRecommendations() {
618
+ const body = {
619
+ age: document.getElementById('ctx-age').value,
620
+ device: document.getElementById('ctx-device').value,
621
+ tod: document.getElementById('ctx-tod').value,
622
+ content: document.getElementById('ctx-content').value,
623
+ region: document.getElementById('ctx-region').value,
624
+ };
625
+ const r = await fetch('/api/recommend', {
626
+ method:'POST', headers:{'Content-Type':'application/json'},
627
+ body: JSON.stringify(body)
628
+ });
629
+ const d = await r.json();
630
+ const keys = ['epsilon_greedy','ucb1','thompson','linucb'];
631
+ const ids = ['eg','ucb','ts','lu'];
632
+ keys.forEach((k, i) => {
633
+ const rec = d[k];
634
+ document.getElementById('r-' + ids[i] + '-ad').textContent = rec.ad_id + ' (' + rec.category + ')';
635
+ document.getElementById('r-' + ids[i] + '-meta').textContent = rec.format + ' | $' + rec.bid.toFixed(2);
636
+ document.getElementById('r-' + ids[i] + '-score').textContent = 'Est. CTR: ' + (rec.score * 100).toFixed(2) + '%';
637
+ });
638
+ }
639
+
640
+ // ── Tab 2: Simulation ────────────────────────────────────────────────────────
641
+ let simRollingData = {};
642
+
643
+ async function runSimulation() {
644
+ const n = parseInt(document.getElementById('n-impressions').value);
645
+ const seed = parseInt(document.getElementById('sim-seed').value);
646
+ const btn = document.getElementById('run-sim-btn');
647
+ const bar = document.getElementById('sim-progress-bar');
648
+ const fill = document.getElementById('sim-progress-fill');
649
+ const txt = document.getElementById('sim-progress-text');
650
+
651
+ btn.disabled = true;
652
+ bar.style.display = 'block';
653
+ fill.style.width = '0%';
654
+ txt.textContent = 'Starting simulation…';
655
+ simRollingData = {epsilon_greedy:[], ucb1:[], thompson:[], linucb:[], steps:[]};
656
+
657
+ try {
658
+ const resp = await fetch('/api/simulate', {
659
+ method:'POST', headers:{'Content-Type':'application/json'},
660
+ body: JSON.stringify({n_impressions: n, seed: seed})
661
+ });
662
+ const reader = resp.body.getReader();
663
+ const dec = new TextDecoder();
664
+ let buf = '';
665
+ while (true) {
666
+ const {done, value} = await reader.read();
667
+ if (done) break;
668
+ buf += dec.decode(value, {stream: true});
669
+ const parts = buf.split('\n\n');
670
+ buf = parts.pop();
671
+ for (const part of parts) {
672
+ const line = part.trim();
673
+ if (!line.startsWith('data:')) continue;
674
+ const payload = JSON.parse(line.slice(5).trim());
675
+ const pct = Math.round(payload.step / payload.total * 100);
676
+ fill.style.width = pct + '%';
677
+ txt.textContent = 'Step ' + payload.step + ' / ' + payload.total;
678
+ if (payload.done) {
679
+ renderSimCharts(payload);
680
+ renderSimTable(payload);
681
+ btn.disabled = false;
682
+ txt.textContent = 'Simulation complete β€” ' + payload.n_impressions + ' impressions.';
683
+ }
684
+ }
685
+ }
686
+ } catch(e) {
687
+ txt.textContent = 'Error: ' + e.message;
688
+ btn.disabled = false;
689
+ }
690
+ }
691
+
692
+ function renderSimCharts(d) {
693
+ const traces = [
694
+ {x: d.steps, y: d.rolling_ctr.epsilon_greedy, name:'Ξ΅-Greedy', line:{color:'#f59e0b'}},
695
+ {x: d.steps, y: d.rolling_ctr.ucb1, name:'UCB1', line:{color:'#10b981'}},
696
+ {x: d.steps, y: d.rolling_ctr.thompson, name:'Thompson', line:{color:'#3b82f6'}},
697
+ {x: d.steps, y: d.rolling_ctr.linucb, name:'LinUCB', line:{color:'#ef4444'}},
698
+ ];
699
+ Plotly.react('sim-chart', traces, {
700
+ template:'plotly_dark', paper_bgcolor:'#16213e', plot_bgcolor:'#0f0f1a',
701
+ margin:{t:10,b:40,l:50,r:10}, autosize:true,
702
+ xaxis:{title:'Impression', color:'#94a3b8', gridcolor:'#1e293b'},
703
+ yaxis:{title:'Rolling CTR', color:'#94a3b8', gridcolor:'#1e293b'},
704
+ legend:{bgcolor:'#16213e', font:{color:'#e2e8f0'}},
705
+ }, {responsive:true});
706
+ }
707
+
708
+ function renderSimTable(d) {
709
+ const keys = ['epsilon_greedy','ucb1','thompson','linucb'];
710
+ const names = {'epsilon_greedy':'Ξ΅-Greedy','ucb1':'UCB1','thompson':'Thompson','linucb':'LinUCB'};
711
+ const colors = {'epsilon_greedy':'#f59e0b','ucb1':'#10b981','thompson':'#3b82f6','linucb':'#ef4444'};
712
+ let html = '<table><thead><tr><th>Algorithm</th><th>Final CTR</th><th>Total Reward</th><th>Policy Updates</th></tr></thead><tbody>';
713
+ keys.forEach(k => {
714
+ html += '<tr><td style="color:' + colors[k] + ';font-weight:600;">' + names[k] + '</td>'
715
+ + '<td>' + (d.final_ctr[k] * 100).toFixed(2) + '%</td>'
716
+ + '<td>' + d.total_reward[k] + '</td>'
717
+ + '<td>' + d.n_updates[k] + '</td></tr>';
718
+ });
719
+ html += '</tbody></table>';
720
+ document.getElementById('sim-table-container').innerHTML = html;
721
+ }
722
+
723
+ // ── Tab 3: Regret ────────────────────────────────────────────────────────────
724
+ async function loadRegret() {
725
+ const r = await fetch('/api/regret');
726
+ if (!r.ok) { alert('Run a simulation first.'); return; }
727
+ const d = await r.json();
728
+ if (!d.steps || d.steps.length === 0) { alert('No simulation data yet.'); return; }
729
+ const traces = [
730
+ {x:d.steps, y:d.cumulative_regret.epsilon_greedy, name:'Ξ΅-Greedy', line:{color:'#f59e0b'}},
731
+ {x:d.steps, y:d.cumulative_regret.ucb1, name:'UCB1', line:{color:'#10b981'}},
732
+ {x:d.steps, y:d.cumulative_regret.thompson, name:'Thompson', line:{color:'#3b82f6'}},
733
+ {x:d.steps, y:d.cumulative_regret.linucb, name:'LinUCB', line:{color:'#ef4444'}},
734
+ ];
735
+ Plotly.react('regret-chart', traces, {
736
+ template:'plotly_dark', paper_bgcolor:'#16213e', plot_bgcolor:'#0f0f1a',
737
+ margin:{t:10,b:40,l:50,r:10}, autosize:true,
738
+ xaxis:{title:'Impression', color:'#94a3b8', gridcolor:'#1e293b'},
739
+ yaxis:{title:'Cumulative Regret', color:'#94a3b8', gridcolor:'#1e293b'},
740
+ legend:{bgcolor:'#16213e', font:{color:'#e2e8f0'}},
741
+ }, {responsive:true});
742
+
743
+ const keys = ['epsilon_greedy','ucb1','thompson','linucb'];
744
+ const names = {'epsilon_greedy':'Ξ΅-Greedy','ucb1':'UCB1','thompson':'Thompson','linucb':'LinUCB'};
745
+ const colors = {'epsilon_greedy':'#f59e0b','ucb1':'#10b981','thompson':'#3b82f6','linucb':'#ef4444'};
746
+ let html = '<table><thead><tr><th>Algorithm</th><th>Final Cumulative Regret</th><th>Avg Per-Step Regret</th></tr></thead><tbody>';
747
+ keys.forEach(k => {
748
+ html += '<tr><td style="color:' + colors[k] + ';font-weight:600;">' + names[k] + '</td>'
749
+ + '<td>' + d.final_regret[k].toFixed(2) + '</td>'
750
+ + '<td>' + d.avg_regret[k].toFixed(4) + '</td></tr>';
751
+ });
752
+ html += '</tbody></table>';
753
+ document.getElementById('regret-table-container').innerHTML = html;
754
+ }
755
+
756
+ // ── Tab 4: A/B Test ──────────────────────────────────────────────────────────
757
+ async function runABTest() {
758
+ const pA = document.getElementById('ab-policy-a').value;
759
+ const pB = document.getElementById('ab-policy-b').value;
760
+ const n = parseInt(document.getElementById('ab-impressions').value);
761
+ if (pA === pB) { alert('Please select two different policies.'); return; }
762
+ const btn = document.getElementById('run-ab-btn');
763
+ btn.disabled = true; btn.textContent = 'Running…';
764
+ try {
765
+ const r = await fetch('/api/abtest', {
766
+ method:'POST', headers:{'Content-Type':'application/json'},
767
+ body: JSON.stringify({policy_a: pA, policy_b: pB, n_impressions: n})
768
+ });
769
+ const d = await r.json();
770
+ const names = {epsilon_greedy:'Ξ΅-Greedy', ucb1:'UCB1', thompson:'Thompson', linucb:'LinUCB'};
771
+ document.getElementById('ab-results').style.display = 'block';
772
+ document.getElementById('ab-lbl-a').textContent = names[pA] + ' CTR';
773
+ document.getElementById('ab-lbl-b').textContent = names[pB] + ' CTR';
774
+ document.getElementById('ab-ctr-a').textContent = (d.ctr_a * 100).toFixed(2) + '%';
775
+ document.getElementById('ab-ctr-b').textContent = (d.ctr_b * 100).toFixed(2) + '%';
776
+ document.getElementById('ab-lift').textContent = (d.lift_abs * 100).toFixed(3) + '%';
777
+ document.getElementById('ab-lift-rel').textContent = (d.lift_rel * 100).toFixed(1) + '%';
778
+ document.getElementById('ab-z').textContent = d.z_stat.toFixed(3);
779
+ document.getElementById('ab-p').textContent = d.p_value.toFixed(4);
780
+ document.getElementById('ab-ci').textContent = '[' + (d.ci_low*100).toFixed(3) + '%, ' + (d.ci_high*100).toFixed(3) + '%]';
781
+ const vEl = document.getElementById('ab-verdict');
782
+ if (d.significant) {
783
+ vEl.textContent = 'βœ… Significant (p<0.05)'; vEl.className = 'stat-val verdict-sig';
784
+ } else {
785
+ vEl.textContent = '❌ Not Significant'; vEl.className = 'stat-val verdict-ns';
786
+ }
787
+ // Bar chart with error bars
788
+ const ctrA = d.ctr_a, ctrB = d.ctr_b;
789
+ const seA = Math.sqrt(ctrA*(1-ctrA)/d.n_a), seB = Math.sqrt(ctrB*(1-ctrB)/d.n_b);
790
+ const traceAB = {
791
+ x:[names[pA], names[pB]], y:[ctrA, ctrB],
792
+ type:'bar', marker:{color:['#7c3aed','#0ea5e9']},
793
+ error_y:{type:'data', array:[1.96*seA, 1.96*seB], visible:true, color:'#e2e8f0'},
794
+ text:[(ctrA*100).toFixed(2)+'%', (ctrB*100).toFixed(2)+'%'],
795
+ textposition:'outside',
796
+ };
797
+ Plotly.react('ab-chart', [traceAB], {
798
+ template:'plotly_dark', paper_bgcolor:'#16213e', plot_bgcolor:'#0f0f1a',
799
+ margin:{t:20,b:40,l:50,r:10}, autosize:true, showlegend:false,
800
+ yaxis:{title:'CTR', color:'#94a3b8', gridcolor:'#1e293b'},
801
+ }, {responsive:true});
802
+ } catch(e) {
803
+ alert('Error: ' + e.message);
804
+ } finally {
805
+ btn.disabled = false; btn.textContent = 'βš– Run A/B Test';
806
+ }
807
+ }
808
+
809
+ // ── Tab 5: Heatmap ───────────────────────────────────────────────────────────
810
+ async function loadHeatmap() {
811
+ const algo = document.getElementById('hm-algo').value;
812
+ const r = await fetch('/api/heatmap', {
813
+ method:'POST', headers:{'Content-Type':'application/json'},
814
+ body: JSON.stringify({algorithm: algo})
815
+ });
816
+ const d = await r.json();
817
+ const trace = {
818
+ z: d.matrix, x: d.ad_cats, y: d.content_cats,
819
+ type:'heatmap', colorscale:'Viridis',
820
+ hoverongaps:false,
821
+ colorbar:{title:'Est. CTR', tickfont:{color:'#e2e8f0'}, titlefont:{color:'#e2e8f0'}},
822
+ text: d.matrix.map(row => row.map(v => (v*100).toFixed(2)+'%')),
823
+ texttemplate:'%{text}', textfont:{color:'#fff', size:11},
824
+ };
825
+ const names = {epsilon_greedy:'Ξ΅-Greedy', ucb1:'UCB1', thompson:'Thompson', linucb:'LinUCB'};
826
+ Plotly.react('heatmap-chart', [trace], {
827
+ template:'plotly_dark', paper_bgcolor:'#16213e', plot_bgcolor:'#0f0f1a',
828
+ margin:{t:30,b:60,l:120,r:10}, autosize:true,
829
+ title:{text:'Estimated CTR β€” ' + names[algo], font:{color:'#e2e8f0', size:13}},
830
+ xaxis:{title:'Ad Category', color:'#94a3b8'},
831
+ yaxis:{title:'User Content Category', color:'#94a3b8'},
832
+ }, {responsive:true});
833
+ }
834
+
835
+ // Auto-load heatmap on page load
836
+ loadHeatmap();
837
+ </script>
838
+ </body>
839
+ </html>"""
840
+
841
+ # ─────────────────────────────────────────────────────────────────────────────
842
+ # Flask routes
843
+ # ─────────────────────────────────────────────────────────────────────────────
844
+
845
  @app.route('/')
846
+ def index():
847
+ return render_template_string(TEMPLATE)
848
+
849
+
850
+ @app.route('/api/status')
851
+ def api_status():
852
+ with sim_lock:
853
+ return jsonify({
854
+ "running": sim_state["running"],
855
+ "step": sim_state["step"],
856
+ "total": sim_state["total"],
857
+ })
858
+
859
+
860
+ @app.route('/api/recommend', methods=['POST'])
861
+ def api_recommend():
862
+ data = request.get_json(force=True)
863
+ try:
864
+ ctx = encode_context(
865
+ data['age'], data['device'], data['tod'],
866
+ data['content'], data['region']
867
+ )
868
+ except (KeyError, ValueError) as e:
869
+ return jsonify({"error": str(e)}), 400
870
+
871
+ result = {}
872
+ for key, algo in algorithms.items():
873
+ ad_idx = algo.select(ctx)
874
+ score = algo.predict_ctr(ctx, ad_idx)
875
+ ad_id = AD_IDS[ad_idx]
876
+ result[key] = {
877
+ "ad_id": ad_id,
878
+ "category": AD_CAT_MAP[ad_id],
879
+ "format": AD_FORMATS[ad_id],
880
+ "bid": AD_BIDS[ad_id],
881
+ "score": round(score, 4),
882
+ }
883
+ return jsonify(result)
884
+
885
+
886
+ @app.route('/api/simulate', methods=['POST'])
887
+ def api_simulate():
888
+ data = request.get_json(force=True)
889
+ n_impressions = int(data.get('n_impressions', 3000))
890
+ seed = int(data.get('seed', 42))
891
+ n_impressions = max(1000, min(10000, n_impressions))
892
+
893
+ def generate():
894
+ # Reset all algorithm states
895
+ for algo in algorithms.values():
896
+ algo.reset()
897
+
898
+ np.random.seed(seed)
899
+
900
+ with sim_lock:
901
+ sim_state['running'] = True
902
+ sim_state['step'] = 0
903
+ sim_state['total'] = n_impressions
904
+
905
+ rewards = {k: [] for k in ALGO_KEYS}
906
+ oracle_rew = []
907
+ checkpoint_interval = 50
908
+
909
+ # Per-checkpoint rolling window (last 100 impressions)
910
+ rolling_window = 100
911
+ rolling_ctr_series = {k: [] for k in ALGO_KEYS}
912
+ steps_series = []
913
+
914
+ for t in range(n_impressions):
915
+ ctx = sample_random_context()
916
+
917
+ # Oracle best arm
918
+ oracle_idx = int(np.argmax([true_ctr(a, ctx) for a in range(N_ADS)]))
919
+ oracle_r = int(np.random.rand() < true_ctr(oracle_idx, ctx))
920
+ oracle_rew.append(oracle_r)
921
+
922
+ # Each algorithm selects, receives reward, updates
923
+ for k, algo in algorithms.items():
924
+ act = algo.select(ctx)
925
+ r = int(np.random.rand() < true_ctr(act, ctx))
926
+ algo.update(ctx, act, r)
927
+ rewards[k].append(r)
928
+
929
+ # Checkpoint every `checkpoint_interval` steps
930
+ if (t + 1) % checkpoint_interval == 0 or t == n_impressions - 1:
931
+ steps_series.append(t + 1)
932
+ for k in ALGO_KEYS:
933
+ start = max(0, len(rewards[k]) - rolling_window)
934
+ window = rewards[k][start:]
935
+ rolling_ctr_series[k].append(round(sum(window) / len(window), 4))
936
+
937
+ with sim_lock:
938
+ sim_state['step'] = t + 1
939
+
940
+ payload = {
941
+ "step": t + 1,
942
+ "total": n_impressions,
943
+ "done": False,
944
+ }
945
+ yield f"data: {json.dumps(payload)}\n\n"
946
+
947
+ # Final payload with full series
948
+ final_ctr = {k: round(sum(rewards[k]) / len(rewards[k]), 4) for k in ALGO_KEYS}
949
+ total_rew = {k: int(sum(rewards[k])) for k in ALGO_KEYS}
950
+ n_upd = {k: algorithms[k].n_updates for k in ALGO_KEYS}
951
+
952
+ # Compute regret series (checkpointed at checkpoint_interval)
953
+ cum_regret_series = {k: [] for k in ALGO_KEYS}
954
+ for ci, step in enumerate(steps_series):
955
+ end = step
956
+ start_ci = (ci * checkpoint_interval) if ci > 0 else 0
957
+ for k in ALGO_KEYS:
958
+ slice_oracle = oracle_rew[:end]
959
+ slice_algo = rewards[k][:end]
960
+ cum_regret = sum(o - a for o, a in zip(slice_oracle, slice_algo))
961
+ cum_regret_series[k].append(round(cum_regret, 4))
962
+
963
+ # Store for /api/regret
964
+ with sim_lock:
965
+ sim_state['running'] = False
966
+ sim_state['last_results'] = {
967
+ 'steps': steps_series,
968
+ 'cumulative_regret': cum_regret_series,
969
+ 'final_regret': {k: cum_regret_series[k][-1] for k in ALGO_KEYS},
970
+ 'avg_regret': {k: round(cum_regret_series[k][-1] / n_impressions, 5) for k in ALGO_KEYS},
971
+ }
972
+
973
+ final_payload = {
974
+ "done": True,
975
+ "step": n_impressions,
976
+ "total": n_impressions,
977
+ "n_impressions": n_impressions,
978
+ "steps": steps_series,
979
+ "rolling_ctr": rolling_ctr_series,
980
+ "final_ctr": final_ctr,
981
+ "total_reward": total_rew,
982
+ "n_updates": n_upd,
983
+ }
984
+ yield f"data: {json.dumps(final_payload)}\n\n"
985
+
986
+ return Response(
987
+ generate(),
988
+ mimetype='text/event-stream',
989
+ headers={'Cache-Control': 'no-cache', 'X-Accel-Buffering': 'no'},
990
+ )
991
+
992
+
993
+ @app.route('/api/regret')
994
+ def api_regret():
995
+ with sim_lock:
996
+ results = sim_state.get('last_results')
997
+ if results is None:
998
+ return jsonify({"error": "No simulation results available. Run a simulation first."}), 404
999
+ return jsonify(results)
1000
+
1001
+
1002
+ @app.route('/api/abtest', methods=['POST'])
1003
+ def api_abtest():
1004
+ data = request.get_json(force=True)
1005
+ key_a = data.get('policy_a', 'linucb')
1006
+ key_b = data.get('policy_b', 'ucb1')
1007
+ n_tot = int(data.get('n_impressions', 5000))
1008
+ n_tot = max(1000, min(20000, n_tot))
1009
+
1010
+ if key_a not in ALGO_CLASSES or key_b not in ALGO_CLASSES:
1011
+ return jsonify({"error": "Invalid policy key"}), 400
1012
+ if key_a == key_b:
1013
+ return jsonify({"error": "Policy A and B must differ"}), 400
1014
+
1015
+ algo_a = ALGO_CLASSES[key_a]()
1016
+ algo_b = ALGO_CLASSES[key_b]()
1017
+ n_each = n_tot // 2
1018
+ np.random.seed(1)
1019
+
1020
+ r_a, r_b = [], []
1021
+ for _ in range(n_each):
1022
+ ctx = sample_random_context()
1023
+ act = algo_a.select(ctx)
1024
+ rew = int(np.random.rand() < true_ctr(act, ctx))
1025
+ algo_a.update(ctx, act, rew)
1026
+ r_a.append(rew)
1027
+
1028
+ for _ in range(n_each):
1029
+ ctx = sample_random_context()
1030
+ act = algo_b.select(ctx)
1031
+ rew = int(np.random.rand() < true_ctr(act, ctx))
1032
+ algo_b.update(ctx, act, rew)
1033
+ r_b.append(rew)
1034
+
1035
+ n1, n2 = len(r_a), len(r_b)
1036
+ p1, p2 = sum(r_a) / n1, sum(r_b) / n2
1037
+ p_pool = (sum(r_a) + sum(r_b)) / (n1 + n2)
1038
+ se = math.sqrt(p_pool * (1 - p_pool) * (1/n1 + 1/n2)) if p_pool not in (0, 1) else 1e-9
1039
+ z = (p1 - p2) / se
1040
+ p_value = float(2 * (1 - stats.norm.cdf(abs(z))))
1041
+ se_diff = math.sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
1042
+ ci_low = (p1 - p2) - 1.96 * se_diff
1043
+ ci_high = (p1 - p2) + 1.96 * se_diff
1044
+
1045
+ return jsonify({
1046
+ "ctr_a": round(p1, 5),
1047
+ "ctr_b": round(p2, 5),
1048
+ "n_a": n1,
1049
+ "n_b": n2,
1050
+ "lift_abs": round(p1 - p2, 5),
1051
+ "lift_rel": round((p1 - p2) / max(p2, 1e-9), 5),
1052
+ "z_stat": round(z, 4),
1053
+ "p_value": round(p_value, 5),
1054
+ "ci_low": round(ci_low, 5),
1055
+ "ci_high": round(ci_high, 5),
1056
+ "significant": p_value < 0.05,
1057
+ })
1058
+
1059
+
1060
+ @app.route('/api/heatmap', methods=['POST'])
1061
+ def api_heatmap():
1062
+ data = request.get_json(force=True)
1063
+ algo_key = data.get('algorithm', 'linucb')
1064
+ if algo_key not in algorithms:
1065
+ return jsonify({"error": "Invalid algorithm"}), 400
1066
+
1067
+ algo = algorithms[algo_key]
1068
+ ad_cats = ["Tech", "Fashion", "Finance", "Food", "Travel"]
1069
+ matrix = []
1070
+
1071
+ for content in CONTENT_CATS:
1072
+ row = []
1073
+ for ad_cat in ad_cats:
1074
+ # Representative ad: first ad of this category
1075
+ ad_idx_for_cat = ad_cats.index(ad_cat) * 4 # ad_01, ad_05, ad_09, ad_13, ad_17
1076
+ ctx = encode_context("adult", "desktop", "afternoon", content, "north_america")
1077
+ score = algo.predict_ctr(ctx, ad_idx_for_cat)
1078
+ row.append(round(float(score), 5))
1079
+ matrix.append(row)
1080
+
1081
+ return jsonify({
1082
+ "matrix": matrix,
1083
+ "content_cats": CONTENT_CATS,
1084
+ "ad_cats": ad_cats,
1085
+ "algorithm": algo_key,
1086
+ })
1087
+
1088
+
1089
+ # ─────────────────────────────────────────────────────────────────────────────
1090
+ if __name__ == '__main__':
1091
+ app.run(host='0.0.0.0', port=7860, debug=False, threaded=True)
requirements.txt CHANGED
@@ -1 +1,5 @@
1
- ο»Ώflask==3.0.0
 
 
 
 
 
1
+ ο»Ώflask==2.3.0
2
+ torch==2.0.0
3
+ numpy==1.24.0
4
+ scipy==1.10.0
5
+ plotly==5.14.0