chyams Claude Opus 4.5 commited on
Commit
376d344
·
0 Parent(s):

Capture decisions: LLM Explorer tool, HF Spaces deployment

Browse files

- Built tools/llm-explorer/ Gradio app (3 tabs + admin panel)
- Deployed to HuggingFace Spaces (chyams/llm-explorer) with T4 GPU
- Separate tools venv at ~/venvs/responsible-ai-course-tools
- Admin-swappable model list, seed accordion UX, slider debounce
- Session notes added to 2026-spring/CLAUDE.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (6) hide show
  1. .gitignore +3 -0
  2. README.md +45 -0
  3. app.py +487 -0
  4. config.json +8 -0
  5. models.py +343 -0
  6. requirements.txt +7 -0
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ .env
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LLM Explorer
3
+ emoji: 🔬
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 6.5.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Interactive tools for next-token prediction, text generation
12
+ suggested_hardware: t4-small
13
+ ---
14
+
15
+ # LLM Explorer
16
+
17
+ Interactive tools for understanding how Large Language Models work. Built for the *Responsible AI: Technology, Power, and Justice* course at Huston-Tillotson University.
18
+
19
+ ## Tools
20
+
21
+ 1. **Probability Explorer** — Enter a prompt, watch the model predict one token at a time with full probability distributions
22
+ 2. **Generator** — Generate text with adjustable temperature and top-k sampling
23
+ 3. **Tokenizer** — See how text is split into tokens
24
+
25
+ ## Deployment
26
+
27
+ This app is designed to run on a HuggingFace Space with a dedicated T4 GPU.
28
+
29
+ ### Setup
30
+
31
+ 1. Create a new Space at huggingface.co/new-space
32
+ 2. Select Gradio SDK
33
+ 3. Push the contents of this directory to the Space repo
34
+ 4. In Space Settings, select T4 GPU hardware
35
+ 5. Set `ADMIN_PASSWORD` as a Space Secret
36
+
37
+ ### Local Development
38
+
39
+ ```bash
40
+ cd tools/llm-explorer
41
+ pip install -r requirements.txt
42
+ python app.py
43
+ ```
44
+
45
+ Opens at http://localhost:7860
app.py ADDED
@@ -0,0 +1,487 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LLM Explorer — Interactive tools for understanding how LLMs work.
2
+
3
+ Gradio app with three tabs:
4
+ 1. Step-by-Step Probability Explorer
5
+ 2. Interactive Generator
6
+ 3. Tokenizer
7
+
8
+ Plus a password-protected Admin panel for runtime configuration.
9
+ """
10
+
11
+ import os
12
+
13
+ import gradio as gr
14
+
15
+ from models import AVAILABLE_MODELS, manager
16
+
17
+ # ---------------------------------------------------------------------------
18
+ # Admin password — set via env var on HF Spaces, or fall back to config/default
19
+ # ---------------------------------------------------------------------------
20
+ ADMIN_PASSWORD = os.environ.get("ADMIN_PASSWORD", "admin")
21
+
22
+ # ---------------------------------------------------------------------------
23
+ # HTML rendering helpers
24
+ # ---------------------------------------------------------------------------
25
+
26
+ # Alternating token chip colors
27
+ TOKEN_COLORS = [
28
+ ("#e0f2fe", "#0c4a6e"), # light blue / dark blue
29
+ ("#fef3c7", "#92400e"), # light amber / dark amber
30
+ ("#d1fae5", "#065f46"), # light green / dark green
31
+ ("#fce7f3", "#9d174d"), # light pink / dark pink
32
+ ("#e0e7ff", "#3730a3"), # light indigo / dark indigo
33
+ ("#fde68a", "#78350f"), # yellow / brown
34
+ ]
35
+
36
+
37
+ def _render_step_html(step_data: dict, prompt: str) -> str:
38
+ """Render one generation step as styled HTML."""
39
+ s = step_data
40
+ generated = s["text"][len(prompt):]
41
+
42
+ # Build probability bar chart
43
+ rows_html = ""
44
+ for token_str, prob, tid in s["top_tokens"]:
45
+ bar_width = max(1, int(prob * 300))
46
+ is_selected = tid == s["token_id"]
47
+ bg = "#2563eb" if is_selected else "#94a3b8"
48
+ label_style = "font-weight:700;" if is_selected else ""
49
+ arrow = " \u2190" if is_selected else ""
50
+ token_display = repr(token_str)
51
+ rows_html += f"""
52
+ <div style="display:flex;align-items:center;gap:8px;margin:2px 0;font-family:monospace;font-size:13px;">
53
+ <span style="width:140px;text-align:right;color:#1e293b;{label_style}">{token_display}</span>
54
+ <div style="width:{bar_width}px;height:16px;background:{bg};border-radius:3px;"></div>
55
+ <span style="color:#334155;{label_style}">{prob:.4f}{arrow}</span>
56
+ </div>"""
57
+
58
+ return f"""
59
+ <div style="border:1px solid #e2e8f0;border-radius:8px;padding:12px;margin:8px 0;background:#fff;">
60
+ <div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:8px;">
61
+ <span style="font-weight:600;color:#1e293b;">Step {s['step']}</span>
62
+ <span style="color:#64748b;font-size:12px;">Entropy: {s['entropy']:.2f} bits</span>
63
+ </div>
64
+ <div style="font-family:monospace;font-size:14px;padding:8px;background:#f8fafc;border-radius:4px;margin-bottom:8px;word-wrap:break-word;">
65
+ <span style="color:#64748b;">{prompt}</span><span style="color:#1e293b;font-weight:600;">{generated}</span>
66
+ </div>
67
+ {rows_html}
68
+ </div>"""
69
+
70
+
71
+ def _render_final_text_html(prompt: str, generated_text: str) -> str:
72
+ """Render just the final generated text."""
73
+ generated = generated_text[len(prompt):]
74
+ return f"""
75
+ <div style="border:1px solid #e2e8f0;border-radius:8px;padding:16px;background:#fff;">
76
+ <div style="font-family:monospace;font-size:16px;line-height:1.6;word-wrap:break-word;">
77
+ <span style="color:#94a3b8;">{prompt}</span><span style="color:#1e293b;font-weight:600;">{generated}</span>
78
+ </div>
79
+ </div>"""
80
+
81
+
82
+ def _render_tokens_html(tokens: list[tuple[str, int]]) -> str:
83
+ """Render tokenized text as colored chips."""
84
+ chips = ""
85
+ for i, (token_str, tid) in enumerate(tokens):
86
+ bg, fg = TOKEN_COLORS[i % len(TOKEN_COLORS)]
87
+ # Escape HTML
88
+ display = token_str.replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
89
+ # Show spaces explicitly
90
+ if display.strip() == "":
91
+ display = repr(token_str).strip("'")
92
+ chips += f"""<span title="ID: {tid}" style="
93
+ display:inline-block;
94
+ padding:4px 8px;
95
+ margin:2px;
96
+ border-radius:4px;
97
+ background:{bg};
98
+ color:{fg};
99
+ font-family:monospace;
100
+ font-size:14px;
101
+ cursor:default;
102
+ ">{display}</span>"""
103
+
104
+ return f"""
105
+ <div style="padding:8px;">
106
+ <div style="margin-bottom:12px;color:#64748b;font-size:13px;">
107
+ {len(tokens)} tokens &mdash; hover for token IDs
108
+ </div>
109
+ <div style="line-height:2.2;">{chips}</div>
110
+ </div>"""
111
+
112
+
113
+ # ---------------------------------------------------------------------------
114
+ # Tab 1: Step-by-Step Probability Explorer
115
+ # ---------------------------------------------------------------------------
116
+
117
+ def explore_probabilities(prompt, temperature, top_n, steps, show_steps, seed):
118
+ """Generate tokens step by step and return formatted HTML."""
119
+ if not manager.is_ready():
120
+ return f"<p style='color:red;'>{manager.status_message()}</p>"
121
+
122
+ seed = int(seed)
123
+ results = manager.generate_step_by_step(
124
+ prompt=prompt,
125
+ steps=int(steps),
126
+ temperature=temperature,
127
+ top_n=int(top_n),
128
+ seed=seed,
129
+ show_steps=show_steps,
130
+ )
131
+
132
+ if not results:
133
+ return "<p>No results generated.</p>"
134
+
135
+ if show_steps:
136
+ html_parts = [_render_step_html(r, prompt) for r in results]
137
+ return "\n".join(html_parts)
138
+ else:
139
+ final_text = results[-1]["text"]
140
+ return _render_final_text_html(prompt, final_text)
141
+
142
+
143
+ def on_show_steps_change(show_steps):
144
+ """Adjust max steps slider when show_steps toggles."""
145
+ new_max = 20 if show_steps else 100
146
+ return gr.update(maximum=new_max)
147
+
148
+
149
+ # ---------------------------------------------------------------------------
150
+ # Tab 2: Interactive Generator
151
+ # ---------------------------------------------------------------------------
152
+
153
+ def generate_interactive(prompt, num_tokens, temperature, top_k, seed):
154
+ """Generate text and return it."""
155
+ if not manager.is_ready():
156
+ return f"*{manager.status_message()}*"
157
+
158
+ seed = int(seed)
159
+ text = manager.generate_text(
160
+ prompt=prompt,
161
+ num_tokens=int(num_tokens),
162
+ temperature=temperature,
163
+ top_k=int(top_k),
164
+ seed=seed,
165
+ )
166
+ return text
167
+
168
+
169
+ # ---------------------------------------------------------------------------
170
+ # Tab 3: Tokenizer
171
+ # ---------------------------------------------------------------------------
172
+
173
+ def tokenize_text(text):
174
+ """Tokenize input and return formatted HTML."""
175
+ if not manager.is_ready():
176
+ return f"<p style='color:red;'>{manager.status_message()}</p>"
177
+
178
+ tokens = manager.tokenize(text)
179
+ return _render_tokens_html(tokens)
180
+
181
+
182
+ # ---------------------------------------------------------------------------
183
+ # Admin panel
184
+ # ---------------------------------------------------------------------------
185
+
186
+ def admin_login(password):
187
+ """Check admin password and return visibility update."""
188
+ if password == ADMIN_PASSWORD:
189
+ return gr.update(visible=True), gr.update(visible=False), ""
190
+ return gr.update(visible=False), gr.update(visible=True), "Incorrect password."
191
+
192
+
193
+ def admin_load_model(model_name):
194
+ """Load a new model from admin panel."""
195
+ status = manager.load_model(model_name)
196
+ cfg = manager.get_config()
197
+ return status, json.dumps(cfg, indent=2)
198
+
199
+
200
+ def admin_save_defaults(prompt, temperature, top_n, steps, seed):
201
+ """Save default settings."""
202
+ manager.update_config(
203
+ default_prompt=prompt,
204
+ default_temperature=temperature,
205
+ default_top_n=int(top_n),
206
+ default_steps=int(steps),
207
+ default_seed=int(seed),
208
+ )
209
+ return "Defaults saved."
210
+
211
+
212
+ import json
213
+
214
+
215
+ # ---------------------------------------------------------------------------
216
+ # Build the Gradio app
217
+ # ---------------------------------------------------------------------------
218
+
219
+ def create_app():
220
+ cfg = manager.get_config()
221
+
222
+ with gr.Blocks(
223
+ title="LLM Explorer",
224
+ theme=gr.themes.Soft(),
225
+ ) as demo:
226
+ gr.Markdown("# LLM Explorer\n*Interactive tools for understanding how LLMs work*")
227
+
228
+ # Status bar
229
+ status_display = gr.Markdown(value=f"**{manager.status_message()}**")
230
+
231
+ # ==================================================================
232
+ # Tab 1: Probability Explorer
233
+ # ==================================================================
234
+ with gr.Tab("Probability Explorer"):
235
+ gr.Markdown("### Step-by-Step Next-Token Prediction")
236
+ gr.Markdown(
237
+ "Enter a prompt and watch the model predict one token at a time. "
238
+ "Each step shows the probability distribution over the vocabulary."
239
+ )
240
+
241
+ with gr.Row():
242
+ with gr.Column(scale=3):
243
+ t1_prompt = gr.Textbox(
244
+ label="Prompt",
245
+ value=cfg.get("default_prompt", "The best thing about Huston-Tillotson University is"),
246
+ lines=2,
247
+ )
248
+ with gr.Column(scale=1):
249
+ t1_show_steps = gr.Checkbox(
250
+ label="Show steps",
251
+ value=True,
252
+ )
253
+
254
+ with gr.Row():
255
+ t1_temperature = gr.Slider(
256
+ label="Temperature",
257
+ minimum=0.1, maximum=2.5, step=0.1,
258
+ value=cfg.get("default_temperature", 0.8),
259
+ )
260
+ t1_top_n = gr.Slider(
261
+ label="Top-N tokens",
262
+ minimum=5, maximum=30, step=1,
263
+ value=cfg.get("default_top_n", 10),
264
+ )
265
+ t1_steps = gr.Slider(
266
+ label="Steps",
267
+ minimum=1, maximum=20, step=1,
268
+ value=cfg.get("default_steps", 8),
269
+ )
270
+
271
+ with gr.Accordion(f"Seed: {cfg.get('default_seed', 42)}", open=False):
272
+ t1_seed = gr.Number(
273
+ label="Random seed",
274
+ value=cfg.get("default_seed", 42),
275
+ precision=0,
276
+ )
277
+
278
+ t1_generate_btn = gr.Button("Generate", variant="primary")
279
+ t1_output = gr.HTML(label="Output")
280
+
281
+ # Show steps toggle adjusts max steps
282
+ t1_show_steps.change(
283
+ fn=on_show_steps_change,
284
+ inputs=[t1_show_steps],
285
+ outputs=[t1_steps],
286
+ )
287
+
288
+ t1_generate_btn.click(
289
+ fn=explore_probabilities,
290
+ inputs=[t1_prompt, t1_temperature, t1_top_n, t1_steps, t1_show_steps, t1_seed],
291
+ outputs=[t1_output],
292
+ )
293
+
294
+ # ==================================================================
295
+ # Tab 2: Interactive Generator
296
+ # ==================================================================
297
+ with gr.Tab("Generator"):
298
+ gr.Markdown("### Interactive Text Generator")
299
+ gr.Markdown(
300
+ "Generate text from a prompt. Adjust temperature and top-k, "
301
+ "then release the slider to regenerate with the same seed."
302
+ )
303
+
304
+ t2_prompt = gr.Textbox(
305
+ label="Prompt",
306
+ value=cfg.get("default_prompt", "The best thing about Huston-Tillotson University is"),
307
+ lines=2,
308
+ )
309
+
310
+ with gr.Row():
311
+ t2_num_tokens = gr.Slider(
312
+ label="Tokens to generate",
313
+ minimum=5, maximum=100, step=1,
314
+ value=30,
315
+ )
316
+ t2_temperature = gr.Slider(
317
+ label="Temperature",
318
+ minimum=0.1, maximum=2.5, step=0.1,
319
+ value=cfg.get("default_temperature", 0.8),
320
+ )
321
+ t2_top_k = gr.Slider(
322
+ label="Top-K",
323
+ minimum=1, maximum=100, step=1,
324
+ value=40,
325
+ )
326
+
327
+ with gr.Accordion(f"Seed: {cfg.get('default_seed', 42)}", open=False):
328
+ t2_seed = gr.Number(
329
+ label="Random seed",
330
+ value=cfg.get("default_seed", 42),
331
+ precision=0,
332
+ )
333
+
334
+ t2_generate_btn = gr.Button("Generate", variant="primary")
335
+ t2_output = gr.Textbox(
336
+ label="Generated text",
337
+ lines=8,
338
+ interactive=False,
339
+ )
340
+
341
+ gen_inputs = [t2_prompt, t2_num_tokens, t2_temperature, t2_top_k, t2_seed]
342
+
343
+ t2_generate_btn.click(
344
+ fn=generate_interactive,
345
+ inputs=gen_inputs,
346
+ outputs=[t2_output],
347
+ )
348
+
349
+ # Slider release events trigger regeneration
350
+ t2_temperature.release(
351
+ fn=generate_interactive,
352
+ inputs=gen_inputs,
353
+ outputs=[t2_output],
354
+ )
355
+ t2_top_k.release(
356
+ fn=generate_interactive,
357
+ inputs=gen_inputs,
358
+ outputs=[t2_output],
359
+ )
360
+
361
+ # ==================================================================
362
+ # Tab 3: Tokenizer
363
+ # ==================================================================
364
+ with gr.Tab("Tokenizer"):
365
+ gr.Markdown("### Token Visualization")
366
+ gr.Markdown(
367
+ "Enter any text to see how the model's tokenizer splits it into tokens. "
368
+ "Hover over each token to see its numeric ID."
369
+ )
370
+
371
+ t3_input = gr.Textbox(
372
+ label="Text",
373
+ value="Huston-Tillotson University is an HBCU in Austin, Texas",
374
+ lines=3,
375
+ )
376
+ t3_btn = gr.Button("Tokenize", variant="primary")
377
+ t3_output = gr.HTML(label="Tokens")
378
+
379
+ t3_btn.click(
380
+ fn=tokenize_text,
381
+ inputs=[t3_input],
382
+ outputs=[t3_output],
383
+ )
384
+
385
+ # ==================================================================
386
+ # Admin Panel
387
+ # ==================================================================
388
+ with gr.Tab("Admin"):
389
+ gr.Markdown("### Admin Panel")
390
+
391
+ # Login gate
392
+ with gr.Group() as admin_login_group:
393
+ admin_pw = gr.Textbox(
394
+ label="Password",
395
+ type="password",
396
+ placeholder="Enter admin password",
397
+ )
398
+ admin_login_btn = gr.Button("Login")
399
+ admin_login_msg = gr.Markdown("")
400
+
401
+ # Admin controls (hidden until login)
402
+ with gr.Group(visible=False) as admin_controls:
403
+ gr.Markdown("#### Model")
404
+ with gr.Row():
405
+ admin_model_dropdown = gr.Dropdown(
406
+ choices=list(AVAILABLE_MODELS.keys()),
407
+ value=manager.current_model_name or cfg.get("model", "Qwen2.5-3B"),
408
+ label="Select model",
409
+ )
410
+ admin_load_btn = gr.Button("Load Model", variant="primary")
411
+ admin_model_status = gr.Markdown("")
412
+
413
+ gr.Markdown("---")
414
+ gr.Markdown("#### Default Settings")
415
+ admin_prompt = gr.Textbox(
416
+ label="Default prompt",
417
+ value=cfg.get("default_prompt", ""),
418
+ )
419
+ with gr.Row():
420
+ admin_temp = gr.Number(
421
+ label="Default temperature",
422
+ value=cfg.get("default_temperature", 0.8),
423
+ )
424
+ admin_top_n = gr.Number(
425
+ label="Default top-n",
426
+ value=cfg.get("default_top_n", 10),
427
+ precision=0,
428
+ )
429
+ admin_steps = gr.Number(
430
+ label="Default steps",
431
+ value=cfg.get("default_steps", 8),
432
+ precision=0,
433
+ )
434
+ admin_seed = gr.Number(
435
+ label="Default seed",
436
+ value=cfg.get("default_seed", 42),
437
+ precision=0,
438
+ )
439
+ admin_save_btn = gr.Button("Save Defaults")
440
+ admin_save_msg = gr.Markdown("")
441
+
442
+ gr.Markdown("---")
443
+ gr.Markdown("#### Current Config")
444
+ admin_config_display = gr.Code(
445
+ value=json.dumps(cfg, indent=2),
446
+ language="json",
447
+ interactive=False,
448
+ )
449
+
450
+ # Login wiring
451
+ admin_login_btn.click(
452
+ fn=admin_login,
453
+ inputs=[admin_pw],
454
+ outputs=[admin_controls, admin_login_group, admin_login_msg],
455
+ )
456
+
457
+ # Model loading
458
+ admin_load_btn.click(
459
+ fn=admin_load_model,
460
+ inputs=[admin_model_dropdown],
461
+ outputs=[admin_model_status, admin_config_display],
462
+ )
463
+
464
+ # Save defaults
465
+ admin_save_btn.click(
466
+ fn=admin_save_defaults,
467
+ inputs=[admin_prompt, admin_temp, admin_top_n, admin_steps, admin_seed],
468
+ outputs=[admin_save_msg],
469
+ )
470
+
471
+ return demo
472
+
473
+
474
+ # ---------------------------------------------------------------------------
475
+ # Startup
476
+ # ---------------------------------------------------------------------------
477
+
478
+ if __name__ == "__main__":
479
+ # Load default model on startup
480
+ cfg = manager.get_config()
481
+ model_to_load = cfg.get("model", "Qwen2.5-3B")
482
+ print(f"Loading default model: {model_to_load}")
483
+ status = manager.load_model(model_to_load)
484
+ print(status)
485
+
486
+ app = create_app()
487
+ app.launch(server_name="0.0.0.0", server_port=7860)
config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "GPT-OSS-20B",
3
+ "default_prompt": "The best thing about Huston-Tillotson University is",
4
+ "default_temperature": 0.8,
5
+ "default_top_n": 10,
6
+ "default_steps": 8,
7
+ "default_seed": 42
8
+ }
models.py ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Model management for LLM Explorer.
2
+
3
+ Handles loading, unloading, and swapping models at runtime.
4
+ Provides inference methods for next-token probabilities and step-by-step generation.
5
+ """
6
+
7
+ import gc
8
+ import json
9
+ import math
10
+ import os
11
+ import threading
12
+ from pathlib import Path
13
+
14
+ import torch
15
+ from transformers import AutoModelForCausalLM, AutoTokenizer
16
+
17
+ # ---------------------------------------------------------------------------
18
+ # Available models — add entries here to make them selectable in admin panel.
19
+ # To use a new model, just add it here and redeploy (or restart).
20
+ # ---------------------------------------------------------------------------
21
+ AVAILABLE_MODELS = {
22
+ "Qwen2.5-3B": {
23
+ "id": "Qwen/Qwen2.5-3B",
24
+ "dtype": "float16",
25
+ "description": "Fast, good quality (default)",
26
+ },
27
+ "Qwen2.5-7B (4-bit)": {
28
+ "id": "Qwen/Qwen2.5-7B",
29
+ "quantize": "4bit",
30
+ "description": "Higher quality, quantized to fit T4",
31
+ },
32
+ "Llama-3.2-3B": {
33
+ "id": "meta-llama/Llama-3.2-3B",
34
+ "dtype": "float16",
35
+ "description": "Meta's latest 3B",
36
+ },
37
+ "Mistral-7B-v0.3 (4-bit)": {
38
+ "id": "mistralai/Mistral-7B-v0.3",
39
+ "quantize": "4bit",
40
+ "description": "Best quality, quantized",
41
+ },
42
+ "GPT-OSS-20B": {
43
+ "id": "openai/gpt-oss-20b",
44
+ "dtype": "auto",
45
+ "description": "OpenAI 20B, full precision (local/large GPU only)",
46
+ },
47
+ "GPT-OSS-20B (4-bit)": {
48
+ "id": "openai/gpt-oss-20b",
49
+ "quantize": "4bit",
50
+ "description": "OpenAI 20B, quantized to fit T4 (~10-12GB)",
51
+ },
52
+ }
53
+
54
+ DEFAULT_MODEL = "Qwen2.5-3B"
55
+
56
+ CONFIG_PATH = Path(__file__).parent / "config.json"
57
+
58
+ # ---------------------------------------------------------------------------
59
+ # Helpers
60
+ # ---------------------------------------------------------------------------
61
+
62
+ def _detect_device() -> str:
63
+ """Pick the best available device."""
64
+ if torch.cuda.is_available():
65
+ return "cuda"
66
+ if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
67
+ return "mps"
68
+ return "cpu"
69
+
70
+
71
+ def _load_config() -> dict:
72
+ """Load persisted config or return defaults."""
73
+ defaults = {
74
+ "model": DEFAULT_MODEL,
75
+ "default_prompt": "The best thing about Huston-Tillotson University is",
76
+ "default_temperature": 0.8,
77
+ "default_top_n": 10,
78
+ "default_steps": 8,
79
+ "default_seed": 42,
80
+ }
81
+ if CONFIG_PATH.exists():
82
+ try:
83
+ with open(CONFIG_PATH) as f:
84
+ saved = json.load(f)
85
+ defaults.update(saved)
86
+ except (json.JSONDecodeError, OSError):
87
+ pass
88
+ return defaults
89
+
90
+
91
+ def _save_config(cfg: dict) -> None:
92
+ """Persist config to disk."""
93
+ with open(CONFIG_PATH, "w") as f:
94
+ json.dump(cfg, f, indent=2)
95
+
96
+
97
+ # ---------------------------------------------------------------------------
98
+ # ModelManager — singleton that owns the active model
99
+ # ---------------------------------------------------------------------------
100
+
101
+ class ModelManager:
102
+ """Manages a single active model with hot-swap capability."""
103
+
104
+ def __init__(self):
105
+ self.model = None
106
+ self.tokenizer = None
107
+ self.current_model_name: str | None = None
108
+ self.device: str = _detect_device()
109
+ self.loading = False
110
+ self._lock = threading.Lock()
111
+ self.config = _load_config()
112
+
113
+ # ------------------------------------------------------------------
114
+ # Model lifecycle
115
+ # ------------------------------------------------------------------
116
+
117
+ def load_model(self, model_name: str) -> str:
118
+ """Load a model by its display name. Returns status message."""
119
+ if model_name not in AVAILABLE_MODELS:
120
+ return f"Unknown model: {model_name}"
121
+
122
+ if self.loading:
123
+ return "A model is already being loaded. Please wait."
124
+
125
+ spec = AVAILABLE_MODELS[model_name]
126
+
127
+ with self._lock:
128
+ self.loading = True
129
+ try:
130
+ # Unload current model
131
+ self._unload()
132
+
133
+ # Determine load kwargs
134
+ model_id = spec["id"]
135
+ load_kwargs: dict = {"device_map": "auto"}
136
+
137
+ if spec.get("quantize") == "4bit":
138
+ from transformers import BitsAndBytesConfig
139
+ load_kwargs["quantization_config"] = BitsAndBytesConfig(
140
+ load_in_4bit=True,
141
+ bnb_4bit_compute_dtype=torch.float16,
142
+ )
143
+ elif spec.get("quantize") == "8bit":
144
+ from transformers import BitsAndBytesConfig
145
+ load_kwargs["quantization_config"] = BitsAndBytesConfig(
146
+ load_in_8bit=True,
147
+ )
148
+ else:
149
+ dtype_str = spec.get("dtype", "float16")
150
+ if dtype_str == "auto":
151
+ load_kwargs["torch_dtype"] = "auto"
152
+ else:
153
+ load_kwargs["torch_dtype"] = getattr(torch, dtype_str)
154
+
155
+ # Load tokenizer + model
156
+ self.tokenizer = AutoTokenizer.from_pretrained(model_id)
157
+ self.model = AutoModelForCausalLM.from_pretrained(
158
+ model_id, **load_kwargs
159
+ )
160
+ self.model.eval()
161
+ self.current_model_name = model_name
162
+
163
+ # Persist choice
164
+ self.config["model"] = model_name
165
+ _save_config(self.config)
166
+
167
+ return f"Loaded {model_name} ({model_id})"
168
+
169
+ except Exception as e:
170
+ self._unload()
171
+ return f"Failed to load {model_name}: {e}"
172
+ finally:
173
+ self.loading = False
174
+
175
+ def _unload(self) -> None:
176
+ """Release current model and free memory."""
177
+ if self.model is not None:
178
+ del self.model
179
+ self.model = None
180
+ if self.tokenizer is not None:
181
+ del self.tokenizer
182
+ self.tokenizer = None
183
+ self.current_model_name = None
184
+ gc.collect()
185
+ if torch.cuda.is_available():
186
+ torch.cuda.empty_cache()
187
+
188
+ def is_ready(self) -> bool:
189
+ return self.model is not None and not self.loading
190
+
191
+ def status_message(self) -> str:
192
+ if self.loading:
193
+ return "Loading model..."
194
+ if self.model is None:
195
+ return "No model loaded"
196
+ return f"Model: {self.current_model_name}"
197
+
198
+ # ------------------------------------------------------------------
199
+ # Inference helpers
200
+ # ------------------------------------------------------------------
201
+
202
+ def _get_logits(self, text: str) -> torch.Tensor:
203
+ """Run a forward pass and return logits for the last token position."""
204
+ inputs = self.tokenizer(text, return_tensors="pt")
205
+ inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
206
+ with torch.no_grad():
207
+ out = self.model(**inputs)
208
+ return out.logits[0, -1, :] # (vocab_size,)
209
+
210
+ @staticmethod
211
+ def apply_temperature(logits: torch.Tensor, temperature: float) -> torch.Tensor:
212
+ """Apply temperature scaling to logits and return probabilities."""
213
+ if temperature <= 0:
214
+ temperature = 1e-6
215
+ scaled = logits / temperature
216
+ return torch.softmax(scaled, dim=-1)
217
+
218
+ @staticmethod
219
+ def entropy_bits(probs: torch.Tensor) -> float:
220
+ """Shannon entropy in bits."""
221
+ eps = 1e-20
222
+ p = probs + eps
223
+ return float(-torch.sum(p * torch.log2(p)))
224
+
225
+ def top_k_table(
226
+ self, probs: torch.Tensor, k: int = 10
227
+ ) -> list[tuple[str, float, int]]:
228
+ """Return list of (token_str, probability, token_id) for top-k tokens."""
229
+ topk = torch.topk(probs, k=min(k, probs.shape[0]))
230
+ rows = []
231
+ for prob, idx in zip(topk.values.tolist(), topk.indices.tolist()):
232
+ token_str = self.tokenizer.decode([idx])
233
+ rows.append((token_str, float(prob), int(idx)))
234
+ return rows
235
+
236
+ # ------------------------------------------------------------------
237
+ # High-level generation
238
+ # ------------------------------------------------------------------
239
+
240
+ def generate_step_by_step(
241
+ self,
242
+ prompt: str,
243
+ steps: int = 8,
244
+ temperature: float = 0.8,
245
+ top_n: int = 10,
246
+ seed: int = 42,
247
+ show_steps: bool = True,
248
+ ) -> list[dict]:
249
+ """Generate tokens one at a time, returning per-step data.
250
+
251
+ Each step dict contains:
252
+ - step: int (1-based)
253
+ - text: accumulated text so far
254
+ - token: the sampled token string
255
+ - token_id: int
256
+ - entropy: float (bits)
257
+ - top_tokens: list of (token_str, prob, token_id)
258
+ """
259
+ if not self.is_ready():
260
+ return []
261
+
262
+ text = prompt
263
+ results = []
264
+ rng = torch.Generator()
265
+
266
+ for i in range(steps):
267
+ logits = self._get_logits(text)
268
+ probs = self.apply_temperature(logits, temperature)
269
+ entropy = self.entropy_bits(probs)
270
+ top_tokens = self.top_k_table(probs, k=top_n) if show_steps else []
271
+
272
+ # Sample with deterministic seed per step
273
+ rng.manual_seed(seed + i)
274
+ idx = torch.multinomial(probs.cpu(), num_samples=1, generator=rng).item()
275
+ token_str = self.tokenizer.decode([idx])
276
+ text += token_str
277
+
278
+ results.append({
279
+ "step": i + 1,
280
+ "text": text,
281
+ "token": token_str,
282
+ "token_id": int(idx),
283
+ "entropy": entropy,
284
+ "top_tokens": top_tokens,
285
+ })
286
+
287
+ return results
288
+
289
+ def generate_text(
290
+ self,
291
+ prompt: str,
292
+ num_tokens: int = 30,
293
+ temperature: float = 0.8,
294
+ top_k: int = 40,
295
+ seed: int = 42,
296
+ ) -> str:
297
+ """Generate text with top-k sampling. Returns prompt + generated text."""
298
+ if not self.is_ready():
299
+ return prompt
300
+
301
+ text = prompt
302
+ rng = torch.Generator()
303
+
304
+ for i in range(num_tokens):
305
+ logits = self._get_logits(text)
306
+
307
+ # Apply top-k filtering
308
+ if top_k > 0:
309
+ top_k_vals, top_k_idxs = torch.topk(logits, k=min(top_k, logits.shape[0]))
310
+ mask = torch.full_like(logits, float("-inf"))
311
+ mask.scatter_(0, top_k_idxs, top_k_vals)
312
+ logits = mask
313
+
314
+ probs = self.apply_temperature(logits, temperature)
315
+
316
+ rng.manual_seed(seed + i)
317
+ idx = torch.multinomial(probs.cpu(), num_samples=1, generator=rng).item()
318
+ token_str = self.tokenizer.decode([idx])
319
+ text += token_str
320
+
321
+ return text
322
+
323
+ def tokenize(self, text: str) -> list[tuple[str, int]]:
324
+ """Tokenize text and return list of (token_str, token_id)."""
325
+ if self.tokenizer is None:
326
+ return []
327
+ ids = self.tokenizer.encode(text)
328
+ return [(self.tokenizer.decode([tid]), tid) for tid in ids]
329
+
330
+ # ------------------------------------------------------------------
331
+ # Config helpers
332
+ # ------------------------------------------------------------------
333
+
334
+ def get_config(self) -> dict:
335
+ return dict(self.config)
336
+
337
+ def update_config(self, **kwargs) -> None:
338
+ self.config.update(kwargs)
339
+ _save_config(self.config)
340
+
341
+
342
+ # Module-level singleton
343
+ manager = ModelManager()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ torch>=2.1.0
2
+ transformers>=4.40.0
3
+ accelerate>=0.27.0
4
+ bitsandbytes>=0.43.0
5
+ gradio>=4.20.0
6
+ sentencepiece>=0.2.0
7
+ protobuf>=4.25.0