Spaces:
Running on L4
Running on L4
Commit ·
376d344
0
Parent(s):
Capture decisions: LLM Explorer tool, HF Spaces deployment
Browse files- Built tools/llm-explorer/ Gradio app (3 tabs + admin panel)
- Deployed to HuggingFace Spaces (chyams/llm-explorer) with T4 GPU
- Separate tools venv at ~/venvs/responsible-ai-course-tools
- Admin-swappable model list, seed accordion UX, slider debounce
- Session notes added to 2026-spring/CLAUDE.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- .gitignore +3 -0
- README.md +45 -0
- app.py +487 -0
- config.json +8 -0
- models.py +343 -0
- requirements.txt +7 -0
.gitignore
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
__pycache__/
|
| 2 |
+
*.pyc
|
| 3 |
+
.env
|
README.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: LLM Explorer
|
| 3 |
+
emoji: 🔬
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 6.5.1
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
short_description: Interactive tools for next-token prediction, text generation
|
| 12 |
+
suggested_hardware: t4-small
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# LLM Explorer
|
| 16 |
+
|
| 17 |
+
Interactive tools for understanding how Large Language Models work. Built for the *Responsible AI: Technology, Power, and Justice* course at Huston-Tillotson University.
|
| 18 |
+
|
| 19 |
+
## Tools
|
| 20 |
+
|
| 21 |
+
1. **Probability Explorer** — Enter a prompt, watch the model predict one token at a time with full probability distributions
|
| 22 |
+
2. **Generator** — Generate text with adjustable temperature and top-k sampling
|
| 23 |
+
3. **Tokenizer** — See how text is split into tokens
|
| 24 |
+
|
| 25 |
+
## Deployment
|
| 26 |
+
|
| 27 |
+
This app is designed to run on a HuggingFace Space with a dedicated T4 GPU.
|
| 28 |
+
|
| 29 |
+
### Setup
|
| 30 |
+
|
| 31 |
+
1. Create a new Space at huggingface.co/new-space
|
| 32 |
+
2. Select Gradio SDK
|
| 33 |
+
3. Push the contents of this directory to the Space repo
|
| 34 |
+
4. In Space Settings, select T4 GPU hardware
|
| 35 |
+
5. Set `ADMIN_PASSWORD` as a Space Secret
|
| 36 |
+
|
| 37 |
+
### Local Development
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
cd tools/llm-explorer
|
| 41 |
+
pip install -r requirements.txt
|
| 42 |
+
python app.py
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
Opens at http://localhost:7860
|
app.py
ADDED
|
@@ -0,0 +1,487 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""LLM Explorer — Interactive tools for understanding how LLMs work.
|
| 2 |
+
|
| 3 |
+
Gradio app with three tabs:
|
| 4 |
+
1. Step-by-Step Probability Explorer
|
| 5 |
+
2. Interactive Generator
|
| 6 |
+
3. Tokenizer
|
| 7 |
+
|
| 8 |
+
Plus a password-protected Admin panel for runtime configuration.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import os
|
| 12 |
+
|
| 13 |
+
import gradio as gr
|
| 14 |
+
|
| 15 |
+
from models import AVAILABLE_MODELS, manager
|
| 16 |
+
|
| 17 |
+
# ---------------------------------------------------------------------------
|
| 18 |
+
# Admin password — set via env var on HF Spaces, or fall back to config/default
|
| 19 |
+
# ---------------------------------------------------------------------------
|
| 20 |
+
ADMIN_PASSWORD = os.environ.get("ADMIN_PASSWORD", "admin")
|
| 21 |
+
|
| 22 |
+
# ---------------------------------------------------------------------------
|
| 23 |
+
# HTML rendering helpers
|
| 24 |
+
# ---------------------------------------------------------------------------
|
| 25 |
+
|
| 26 |
+
# Alternating token chip colors
|
| 27 |
+
TOKEN_COLORS = [
|
| 28 |
+
("#e0f2fe", "#0c4a6e"), # light blue / dark blue
|
| 29 |
+
("#fef3c7", "#92400e"), # light amber / dark amber
|
| 30 |
+
("#d1fae5", "#065f46"), # light green / dark green
|
| 31 |
+
("#fce7f3", "#9d174d"), # light pink / dark pink
|
| 32 |
+
("#e0e7ff", "#3730a3"), # light indigo / dark indigo
|
| 33 |
+
("#fde68a", "#78350f"), # yellow / brown
|
| 34 |
+
]
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
def _render_step_html(step_data: dict, prompt: str) -> str:
|
| 38 |
+
"""Render one generation step as styled HTML."""
|
| 39 |
+
s = step_data
|
| 40 |
+
generated = s["text"][len(prompt):]
|
| 41 |
+
|
| 42 |
+
# Build probability bar chart
|
| 43 |
+
rows_html = ""
|
| 44 |
+
for token_str, prob, tid in s["top_tokens"]:
|
| 45 |
+
bar_width = max(1, int(prob * 300))
|
| 46 |
+
is_selected = tid == s["token_id"]
|
| 47 |
+
bg = "#2563eb" if is_selected else "#94a3b8"
|
| 48 |
+
label_style = "font-weight:700;" if is_selected else ""
|
| 49 |
+
arrow = " \u2190" if is_selected else ""
|
| 50 |
+
token_display = repr(token_str)
|
| 51 |
+
rows_html += f"""
|
| 52 |
+
<div style="display:flex;align-items:center;gap:8px;margin:2px 0;font-family:monospace;font-size:13px;">
|
| 53 |
+
<span style="width:140px;text-align:right;color:#1e293b;{label_style}">{token_display}</span>
|
| 54 |
+
<div style="width:{bar_width}px;height:16px;background:{bg};border-radius:3px;"></div>
|
| 55 |
+
<span style="color:#334155;{label_style}">{prob:.4f}{arrow}</span>
|
| 56 |
+
</div>"""
|
| 57 |
+
|
| 58 |
+
return f"""
|
| 59 |
+
<div style="border:1px solid #e2e8f0;border-radius:8px;padding:12px;margin:8px 0;background:#fff;">
|
| 60 |
+
<div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:8px;">
|
| 61 |
+
<span style="font-weight:600;color:#1e293b;">Step {s['step']}</span>
|
| 62 |
+
<span style="color:#64748b;font-size:12px;">Entropy: {s['entropy']:.2f} bits</span>
|
| 63 |
+
</div>
|
| 64 |
+
<div style="font-family:monospace;font-size:14px;padding:8px;background:#f8fafc;border-radius:4px;margin-bottom:8px;word-wrap:break-word;">
|
| 65 |
+
<span style="color:#64748b;">{prompt}</span><span style="color:#1e293b;font-weight:600;">{generated}</span>
|
| 66 |
+
</div>
|
| 67 |
+
{rows_html}
|
| 68 |
+
</div>"""
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def _render_final_text_html(prompt: str, generated_text: str) -> str:
|
| 72 |
+
"""Render just the final generated text."""
|
| 73 |
+
generated = generated_text[len(prompt):]
|
| 74 |
+
return f"""
|
| 75 |
+
<div style="border:1px solid #e2e8f0;border-radius:8px;padding:16px;background:#fff;">
|
| 76 |
+
<div style="font-family:monospace;font-size:16px;line-height:1.6;word-wrap:break-word;">
|
| 77 |
+
<span style="color:#94a3b8;">{prompt}</span><span style="color:#1e293b;font-weight:600;">{generated}</span>
|
| 78 |
+
</div>
|
| 79 |
+
</div>"""
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def _render_tokens_html(tokens: list[tuple[str, int]]) -> str:
|
| 83 |
+
"""Render tokenized text as colored chips."""
|
| 84 |
+
chips = ""
|
| 85 |
+
for i, (token_str, tid) in enumerate(tokens):
|
| 86 |
+
bg, fg = TOKEN_COLORS[i % len(TOKEN_COLORS)]
|
| 87 |
+
# Escape HTML
|
| 88 |
+
display = token_str.replace("&", "&").replace("<", "<").replace(">", ">")
|
| 89 |
+
# Show spaces explicitly
|
| 90 |
+
if display.strip() == "":
|
| 91 |
+
display = repr(token_str).strip("'")
|
| 92 |
+
chips += f"""<span title="ID: {tid}" style="
|
| 93 |
+
display:inline-block;
|
| 94 |
+
padding:4px 8px;
|
| 95 |
+
margin:2px;
|
| 96 |
+
border-radius:4px;
|
| 97 |
+
background:{bg};
|
| 98 |
+
color:{fg};
|
| 99 |
+
font-family:monospace;
|
| 100 |
+
font-size:14px;
|
| 101 |
+
cursor:default;
|
| 102 |
+
">{display}</span>"""
|
| 103 |
+
|
| 104 |
+
return f"""
|
| 105 |
+
<div style="padding:8px;">
|
| 106 |
+
<div style="margin-bottom:12px;color:#64748b;font-size:13px;">
|
| 107 |
+
{len(tokens)} tokens — hover for token IDs
|
| 108 |
+
</div>
|
| 109 |
+
<div style="line-height:2.2;">{chips}</div>
|
| 110 |
+
</div>"""
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
# ---------------------------------------------------------------------------
|
| 114 |
+
# Tab 1: Step-by-Step Probability Explorer
|
| 115 |
+
# ---------------------------------------------------------------------------
|
| 116 |
+
|
| 117 |
+
def explore_probabilities(prompt, temperature, top_n, steps, show_steps, seed):
|
| 118 |
+
"""Generate tokens step by step and return formatted HTML."""
|
| 119 |
+
if not manager.is_ready():
|
| 120 |
+
return f"<p style='color:red;'>{manager.status_message()}</p>"
|
| 121 |
+
|
| 122 |
+
seed = int(seed)
|
| 123 |
+
results = manager.generate_step_by_step(
|
| 124 |
+
prompt=prompt,
|
| 125 |
+
steps=int(steps),
|
| 126 |
+
temperature=temperature,
|
| 127 |
+
top_n=int(top_n),
|
| 128 |
+
seed=seed,
|
| 129 |
+
show_steps=show_steps,
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
if not results:
|
| 133 |
+
return "<p>No results generated.</p>"
|
| 134 |
+
|
| 135 |
+
if show_steps:
|
| 136 |
+
html_parts = [_render_step_html(r, prompt) for r in results]
|
| 137 |
+
return "\n".join(html_parts)
|
| 138 |
+
else:
|
| 139 |
+
final_text = results[-1]["text"]
|
| 140 |
+
return _render_final_text_html(prompt, final_text)
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
def on_show_steps_change(show_steps):
|
| 144 |
+
"""Adjust max steps slider when show_steps toggles."""
|
| 145 |
+
new_max = 20 if show_steps else 100
|
| 146 |
+
return gr.update(maximum=new_max)
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
# ---------------------------------------------------------------------------
|
| 150 |
+
# Tab 2: Interactive Generator
|
| 151 |
+
# ---------------------------------------------------------------------------
|
| 152 |
+
|
| 153 |
+
def generate_interactive(prompt, num_tokens, temperature, top_k, seed):
|
| 154 |
+
"""Generate text and return it."""
|
| 155 |
+
if not manager.is_ready():
|
| 156 |
+
return f"*{manager.status_message()}*"
|
| 157 |
+
|
| 158 |
+
seed = int(seed)
|
| 159 |
+
text = manager.generate_text(
|
| 160 |
+
prompt=prompt,
|
| 161 |
+
num_tokens=int(num_tokens),
|
| 162 |
+
temperature=temperature,
|
| 163 |
+
top_k=int(top_k),
|
| 164 |
+
seed=seed,
|
| 165 |
+
)
|
| 166 |
+
return text
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
# ---------------------------------------------------------------------------
|
| 170 |
+
# Tab 3: Tokenizer
|
| 171 |
+
# ---------------------------------------------------------------------------
|
| 172 |
+
|
| 173 |
+
def tokenize_text(text):
|
| 174 |
+
"""Tokenize input and return formatted HTML."""
|
| 175 |
+
if not manager.is_ready():
|
| 176 |
+
return f"<p style='color:red;'>{manager.status_message()}</p>"
|
| 177 |
+
|
| 178 |
+
tokens = manager.tokenize(text)
|
| 179 |
+
return _render_tokens_html(tokens)
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
# ---------------------------------------------------------------------------
|
| 183 |
+
# Admin panel
|
| 184 |
+
# ---------------------------------------------------------------------------
|
| 185 |
+
|
| 186 |
+
def admin_login(password):
|
| 187 |
+
"""Check admin password and return visibility update."""
|
| 188 |
+
if password == ADMIN_PASSWORD:
|
| 189 |
+
return gr.update(visible=True), gr.update(visible=False), ""
|
| 190 |
+
return gr.update(visible=False), gr.update(visible=True), "Incorrect password."
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
def admin_load_model(model_name):
|
| 194 |
+
"""Load a new model from admin panel."""
|
| 195 |
+
status = manager.load_model(model_name)
|
| 196 |
+
cfg = manager.get_config()
|
| 197 |
+
return status, json.dumps(cfg, indent=2)
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def admin_save_defaults(prompt, temperature, top_n, steps, seed):
|
| 201 |
+
"""Save default settings."""
|
| 202 |
+
manager.update_config(
|
| 203 |
+
default_prompt=prompt,
|
| 204 |
+
default_temperature=temperature,
|
| 205 |
+
default_top_n=int(top_n),
|
| 206 |
+
default_steps=int(steps),
|
| 207 |
+
default_seed=int(seed),
|
| 208 |
+
)
|
| 209 |
+
return "Defaults saved."
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
import json
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
# ---------------------------------------------------------------------------
|
| 216 |
+
# Build the Gradio app
|
| 217 |
+
# ---------------------------------------------------------------------------
|
| 218 |
+
|
| 219 |
+
def create_app():
|
| 220 |
+
cfg = manager.get_config()
|
| 221 |
+
|
| 222 |
+
with gr.Blocks(
|
| 223 |
+
title="LLM Explorer",
|
| 224 |
+
theme=gr.themes.Soft(),
|
| 225 |
+
) as demo:
|
| 226 |
+
gr.Markdown("# LLM Explorer\n*Interactive tools for understanding how LLMs work*")
|
| 227 |
+
|
| 228 |
+
# Status bar
|
| 229 |
+
status_display = gr.Markdown(value=f"**{manager.status_message()}**")
|
| 230 |
+
|
| 231 |
+
# ==================================================================
|
| 232 |
+
# Tab 1: Probability Explorer
|
| 233 |
+
# ==================================================================
|
| 234 |
+
with gr.Tab("Probability Explorer"):
|
| 235 |
+
gr.Markdown("### Step-by-Step Next-Token Prediction")
|
| 236 |
+
gr.Markdown(
|
| 237 |
+
"Enter a prompt and watch the model predict one token at a time. "
|
| 238 |
+
"Each step shows the probability distribution over the vocabulary."
|
| 239 |
+
)
|
| 240 |
+
|
| 241 |
+
with gr.Row():
|
| 242 |
+
with gr.Column(scale=3):
|
| 243 |
+
t1_prompt = gr.Textbox(
|
| 244 |
+
label="Prompt",
|
| 245 |
+
value=cfg.get("default_prompt", "The best thing about Huston-Tillotson University is"),
|
| 246 |
+
lines=2,
|
| 247 |
+
)
|
| 248 |
+
with gr.Column(scale=1):
|
| 249 |
+
t1_show_steps = gr.Checkbox(
|
| 250 |
+
label="Show steps",
|
| 251 |
+
value=True,
|
| 252 |
+
)
|
| 253 |
+
|
| 254 |
+
with gr.Row():
|
| 255 |
+
t1_temperature = gr.Slider(
|
| 256 |
+
label="Temperature",
|
| 257 |
+
minimum=0.1, maximum=2.5, step=0.1,
|
| 258 |
+
value=cfg.get("default_temperature", 0.8),
|
| 259 |
+
)
|
| 260 |
+
t1_top_n = gr.Slider(
|
| 261 |
+
label="Top-N tokens",
|
| 262 |
+
minimum=5, maximum=30, step=1,
|
| 263 |
+
value=cfg.get("default_top_n", 10),
|
| 264 |
+
)
|
| 265 |
+
t1_steps = gr.Slider(
|
| 266 |
+
label="Steps",
|
| 267 |
+
minimum=1, maximum=20, step=1,
|
| 268 |
+
value=cfg.get("default_steps", 8),
|
| 269 |
+
)
|
| 270 |
+
|
| 271 |
+
with gr.Accordion(f"Seed: {cfg.get('default_seed', 42)}", open=False):
|
| 272 |
+
t1_seed = gr.Number(
|
| 273 |
+
label="Random seed",
|
| 274 |
+
value=cfg.get("default_seed", 42),
|
| 275 |
+
precision=0,
|
| 276 |
+
)
|
| 277 |
+
|
| 278 |
+
t1_generate_btn = gr.Button("Generate", variant="primary")
|
| 279 |
+
t1_output = gr.HTML(label="Output")
|
| 280 |
+
|
| 281 |
+
# Show steps toggle adjusts max steps
|
| 282 |
+
t1_show_steps.change(
|
| 283 |
+
fn=on_show_steps_change,
|
| 284 |
+
inputs=[t1_show_steps],
|
| 285 |
+
outputs=[t1_steps],
|
| 286 |
+
)
|
| 287 |
+
|
| 288 |
+
t1_generate_btn.click(
|
| 289 |
+
fn=explore_probabilities,
|
| 290 |
+
inputs=[t1_prompt, t1_temperature, t1_top_n, t1_steps, t1_show_steps, t1_seed],
|
| 291 |
+
outputs=[t1_output],
|
| 292 |
+
)
|
| 293 |
+
|
| 294 |
+
# ==================================================================
|
| 295 |
+
# Tab 2: Interactive Generator
|
| 296 |
+
# ==================================================================
|
| 297 |
+
with gr.Tab("Generator"):
|
| 298 |
+
gr.Markdown("### Interactive Text Generator")
|
| 299 |
+
gr.Markdown(
|
| 300 |
+
"Generate text from a prompt. Adjust temperature and top-k, "
|
| 301 |
+
"then release the slider to regenerate with the same seed."
|
| 302 |
+
)
|
| 303 |
+
|
| 304 |
+
t2_prompt = gr.Textbox(
|
| 305 |
+
label="Prompt",
|
| 306 |
+
value=cfg.get("default_prompt", "The best thing about Huston-Tillotson University is"),
|
| 307 |
+
lines=2,
|
| 308 |
+
)
|
| 309 |
+
|
| 310 |
+
with gr.Row():
|
| 311 |
+
t2_num_tokens = gr.Slider(
|
| 312 |
+
label="Tokens to generate",
|
| 313 |
+
minimum=5, maximum=100, step=1,
|
| 314 |
+
value=30,
|
| 315 |
+
)
|
| 316 |
+
t2_temperature = gr.Slider(
|
| 317 |
+
label="Temperature",
|
| 318 |
+
minimum=0.1, maximum=2.5, step=0.1,
|
| 319 |
+
value=cfg.get("default_temperature", 0.8),
|
| 320 |
+
)
|
| 321 |
+
t2_top_k = gr.Slider(
|
| 322 |
+
label="Top-K",
|
| 323 |
+
minimum=1, maximum=100, step=1,
|
| 324 |
+
value=40,
|
| 325 |
+
)
|
| 326 |
+
|
| 327 |
+
with gr.Accordion(f"Seed: {cfg.get('default_seed', 42)}", open=False):
|
| 328 |
+
t2_seed = gr.Number(
|
| 329 |
+
label="Random seed",
|
| 330 |
+
value=cfg.get("default_seed", 42),
|
| 331 |
+
precision=0,
|
| 332 |
+
)
|
| 333 |
+
|
| 334 |
+
t2_generate_btn = gr.Button("Generate", variant="primary")
|
| 335 |
+
t2_output = gr.Textbox(
|
| 336 |
+
label="Generated text",
|
| 337 |
+
lines=8,
|
| 338 |
+
interactive=False,
|
| 339 |
+
)
|
| 340 |
+
|
| 341 |
+
gen_inputs = [t2_prompt, t2_num_tokens, t2_temperature, t2_top_k, t2_seed]
|
| 342 |
+
|
| 343 |
+
t2_generate_btn.click(
|
| 344 |
+
fn=generate_interactive,
|
| 345 |
+
inputs=gen_inputs,
|
| 346 |
+
outputs=[t2_output],
|
| 347 |
+
)
|
| 348 |
+
|
| 349 |
+
# Slider release events trigger regeneration
|
| 350 |
+
t2_temperature.release(
|
| 351 |
+
fn=generate_interactive,
|
| 352 |
+
inputs=gen_inputs,
|
| 353 |
+
outputs=[t2_output],
|
| 354 |
+
)
|
| 355 |
+
t2_top_k.release(
|
| 356 |
+
fn=generate_interactive,
|
| 357 |
+
inputs=gen_inputs,
|
| 358 |
+
outputs=[t2_output],
|
| 359 |
+
)
|
| 360 |
+
|
| 361 |
+
# ==================================================================
|
| 362 |
+
# Tab 3: Tokenizer
|
| 363 |
+
# ==================================================================
|
| 364 |
+
with gr.Tab("Tokenizer"):
|
| 365 |
+
gr.Markdown("### Token Visualization")
|
| 366 |
+
gr.Markdown(
|
| 367 |
+
"Enter any text to see how the model's tokenizer splits it into tokens. "
|
| 368 |
+
"Hover over each token to see its numeric ID."
|
| 369 |
+
)
|
| 370 |
+
|
| 371 |
+
t3_input = gr.Textbox(
|
| 372 |
+
label="Text",
|
| 373 |
+
value="Huston-Tillotson University is an HBCU in Austin, Texas",
|
| 374 |
+
lines=3,
|
| 375 |
+
)
|
| 376 |
+
t3_btn = gr.Button("Tokenize", variant="primary")
|
| 377 |
+
t3_output = gr.HTML(label="Tokens")
|
| 378 |
+
|
| 379 |
+
t3_btn.click(
|
| 380 |
+
fn=tokenize_text,
|
| 381 |
+
inputs=[t3_input],
|
| 382 |
+
outputs=[t3_output],
|
| 383 |
+
)
|
| 384 |
+
|
| 385 |
+
# ==================================================================
|
| 386 |
+
# Admin Panel
|
| 387 |
+
# ==================================================================
|
| 388 |
+
with gr.Tab("Admin"):
|
| 389 |
+
gr.Markdown("### Admin Panel")
|
| 390 |
+
|
| 391 |
+
# Login gate
|
| 392 |
+
with gr.Group() as admin_login_group:
|
| 393 |
+
admin_pw = gr.Textbox(
|
| 394 |
+
label="Password",
|
| 395 |
+
type="password",
|
| 396 |
+
placeholder="Enter admin password",
|
| 397 |
+
)
|
| 398 |
+
admin_login_btn = gr.Button("Login")
|
| 399 |
+
admin_login_msg = gr.Markdown("")
|
| 400 |
+
|
| 401 |
+
# Admin controls (hidden until login)
|
| 402 |
+
with gr.Group(visible=False) as admin_controls:
|
| 403 |
+
gr.Markdown("#### Model")
|
| 404 |
+
with gr.Row():
|
| 405 |
+
admin_model_dropdown = gr.Dropdown(
|
| 406 |
+
choices=list(AVAILABLE_MODELS.keys()),
|
| 407 |
+
value=manager.current_model_name or cfg.get("model", "Qwen2.5-3B"),
|
| 408 |
+
label="Select model",
|
| 409 |
+
)
|
| 410 |
+
admin_load_btn = gr.Button("Load Model", variant="primary")
|
| 411 |
+
admin_model_status = gr.Markdown("")
|
| 412 |
+
|
| 413 |
+
gr.Markdown("---")
|
| 414 |
+
gr.Markdown("#### Default Settings")
|
| 415 |
+
admin_prompt = gr.Textbox(
|
| 416 |
+
label="Default prompt",
|
| 417 |
+
value=cfg.get("default_prompt", ""),
|
| 418 |
+
)
|
| 419 |
+
with gr.Row():
|
| 420 |
+
admin_temp = gr.Number(
|
| 421 |
+
label="Default temperature",
|
| 422 |
+
value=cfg.get("default_temperature", 0.8),
|
| 423 |
+
)
|
| 424 |
+
admin_top_n = gr.Number(
|
| 425 |
+
label="Default top-n",
|
| 426 |
+
value=cfg.get("default_top_n", 10),
|
| 427 |
+
precision=0,
|
| 428 |
+
)
|
| 429 |
+
admin_steps = gr.Number(
|
| 430 |
+
label="Default steps",
|
| 431 |
+
value=cfg.get("default_steps", 8),
|
| 432 |
+
precision=0,
|
| 433 |
+
)
|
| 434 |
+
admin_seed = gr.Number(
|
| 435 |
+
label="Default seed",
|
| 436 |
+
value=cfg.get("default_seed", 42),
|
| 437 |
+
precision=0,
|
| 438 |
+
)
|
| 439 |
+
admin_save_btn = gr.Button("Save Defaults")
|
| 440 |
+
admin_save_msg = gr.Markdown("")
|
| 441 |
+
|
| 442 |
+
gr.Markdown("---")
|
| 443 |
+
gr.Markdown("#### Current Config")
|
| 444 |
+
admin_config_display = gr.Code(
|
| 445 |
+
value=json.dumps(cfg, indent=2),
|
| 446 |
+
language="json",
|
| 447 |
+
interactive=False,
|
| 448 |
+
)
|
| 449 |
+
|
| 450 |
+
# Login wiring
|
| 451 |
+
admin_login_btn.click(
|
| 452 |
+
fn=admin_login,
|
| 453 |
+
inputs=[admin_pw],
|
| 454 |
+
outputs=[admin_controls, admin_login_group, admin_login_msg],
|
| 455 |
+
)
|
| 456 |
+
|
| 457 |
+
# Model loading
|
| 458 |
+
admin_load_btn.click(
|
| 459 |
+
fn=admin_load_model,
|
| 460 |
+
inputs=[admin_model_dropdown],
|
| 461 |
+
outputs=[admin_model_status, admin_config_display],
|
| 462 |
+
)
|
| 463 |
+
|
| 464 |
+
# Save defaults
|
| 465 |
+
admin_save_btn.click(
|
| 466 |
+
fn=admin_save_defaults,
|
| 467 |
+
inputs=[admin_prompt, admin_temp, admin_top_n, admin_steps, admin_seed],
|
| 468 |
+
outputs=[admin_save_msg],
|
| 469 |
+
)
|
| 470 |
+
|
| 471 |
+
return demo
|
| 472 |
+
|
| 473 |
+
|
| 474 |
+
# ---------------------------------------------------------------------------
|
| 475 |
+
# Startup
|
| 476 |
+
# ---------------------------------------------------------------------------
|
| 477 |
+
|
| 478 |
+
if __name__ == "__main__":
|
| 479 |
+
# Load default model on startup
|
| 480 |
+
cfg = manager.get_config()
|
| 481 |
+
model_to_load = cfg.get("model", "Qwen2.5-3B")
|
| 482 |
+
print(f"Loading default model: {model_to_load}")
|
| 483 |
+
status = manager.load_model(model_to_load)
|
| 484 |
+
print(status)
|
| 485 |
+
|
| 486 |
+
app = create_app()
|
| 487 |
+
app.launch(server_name="0.0.0.0", server_port=7860)
|
config.json
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model": "GPT-OSS-20B",
|
| 3 |
+
"default_prompt": "The best thing about Huston-Tillotson University is",
|
| 4 |
+
"default_temperature": 0.8,
|
| 5 |
+
"default_top_n": 10,
|
| 6 |
+
"default_steps": 8,
|
| 7 |
+
"default_seed": 42
|
| 8 |
+
}
|
models.py
ADDED
|
@@ -0,0 +1,343 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Model management for LLM Explorer.
|
| 2 |
+
|
| 3 |
+
Handles loading, unloading, and swapping models at runtime.
|
| 4 |
+
Provides inference methods for next-token probabilities and step-by-step generation.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import gc
|
| 8 |
+
import json
|
| 9 |
+
import math
|
| 10 |
+
import os
|
| 11 |
+
import threading
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
|
| 14 |
+
import torch
|
| 15 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 16 |
+
|
| 17 |
+
# ---------------------------------------------------------------------------
|
| 18 |
+
# Available models — add entries here to make them selectable in admin panel.
|
| 19 |
+
# To use a new model, just add it here and redeploy (or restart).
|
| 20 |
+
# ---------------------------------------------------------------------------
|
| 21 |
+
AVAILABLE_MODELS = {
|
| 22 |
+
"Qwen2.5-3B": {
|
| 23 |
+
"id": "Qwen/Qwen2.5-3B",
|
| 24 |
+
"dtype": "float16",
|
| 25 |
+
"description": "Fast, good quality (default)",
|
| 26 |
+
},
|
| 27 |
+
"Qwen2.5-7B (4-bit)": {
|
| 28 |
+
"id": "Qwen/Qwen2.5-7B",
|
| 29 |
+
"quantize": "4bit",
|
| 30 |
+
"description": "Higher quality, quantized to fit T4",
|
| 31 |
+
},
|
| 32 |
+
"Llama-3.2-3B": {
|
| 33 |
+
"id": "meta-llama/Llama-3.2-3B",
|
| 34 |
+
"dtype": "float16",
|
| 35 |
+
"description": "Meta's latest 3B",
|
| 36 |
+
},
|
| 37 |
+
"Mistral-7B-v0.3 (4-bit)": {
|
| 38 |
+
"id": "mistralai/Mistral-7B-v0.3",
|
| 39 |
+
"quantize": "4bit",
|
| 40 |
+
"description": "Best quality, quantized",
|
| 41 |
+
},
|
| 42 |
+
"GPT-OSS-20B": {
|
| 43 |
+
"id": "openai/gpt-oss-20b",
|
| 44 |
+
"dtype": "auto",
|
| 45 |
+
"description": "OpenAI 20B, full precision (local/large GPU only)",
|
| 46 |
+
},
|
| 47 |
+
"GPT-OSS-20B (4-bit)": {
|
| 48 |
+
"id": "openai/gpt-oss-20b",
|
| 49 |
+
"quantize": "4bit",
|
| 50 |
+
"description": "OpenAI 20B, quantized to fit T4 (~10-12GB)",
|
| 51 |
+
},
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
DEFAULT_MODEL = "Qwen2.5-3B"
|
| 55 |
+
|
| 56 |
+
CONFIG_PATH = Path(__file__).parent / "config.json"
|
| 57 |
+
|
| 58 |
+
# ---------------------------------------------------------------------------
|
| 59 |
+
# Helpers
|
| 60 |
+
# ---------------------------------------------------------------------------
|
| 61 |
+
|
| 62 |
+
def _detect_device() -> str:
|
| 63 |
+
"""Pick the best available device."""
|
| 64 |
+
if torch.cuda.is_available():
|
| 65 |
+
return "cuda"
|
| 66 |
+
if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
|
| 67 |
+
return "mps"
|
| 68 |
+
return "cpu"
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def _load_config() -> dict:
|
| 72 |
+
"""Load persisted config or return defaults."""
|
| 73 |
+
defaults = {
|
| 74 |
+
"model": DEFAULT_MODEL,
|
| 75 |
+
"default_prompt": "The best thing about Huston-Tillotson University is",
|
| 76 |
+
"default_temperature": 0.8,
|
| 77 |
+
"default_top_n": 10,
|
| 78 |
+
"default_steps": 8,
|
| 79 |
+
"default_seed": 42,
|
| 80 |
+
}
|
| 81 |
+
if CONFIG_PATH.exists():
|
| 82 |
+
try:
|
| 83 |
+
with open(CONFIG_PATH) as f:
|
| 84 |
+
saved = json.load(f)
|
| 85 |
+
defaults.update(saved)
|
| 86 |
+
except (json.JSONDecodeError, OSError):
|
| 87 |
+
pass
|
| 88 |
+
return defaults
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
def _save_config(cfg: dict) -> None:
|
| 92 |
+
"""Persist config to disk."""
|
| 93 |
+
with open(CONFIG_PATH, "w") as f:
|
| 94 |
+
json.dump(cfg, f, indent=2)
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
# ---------------------------------------------------------------------------
|
| 98 |
+
# ModelManager — singleton that owns the active model
|
| 99 |
+
# ---------------------------------------------------------------------------
|
| 100 |
+
|
| 101 |
+
class ModelManager:
|
| 102 |
+
"""Manages a single active model with hot-swap capability."""
|
| 103 |
+
|
| 104 |
+
def __init__(self):
|
| 105 |
+
self.model = None
|
| 106 |
+
self.tokenizer = None
|
| 107 |
+
self.current_model_name: str | None = None
|
| 108 |
+
self.device: str = _detect_device()
|
| 109 |
+
self.loading = False
|
| 110 |
+
self._lock = threading.Lock()
|
| 111 |
+
self.config = _load_config()
|
| 112 |
+
|
| 113 |
+
# ------------------------------------------------------------------
|
| 114 |
+
# Model lifecycle
|
| 115 |
+
# ------------------------------------------------------------------
|
| 116 |
+
|
| 117 |
+
def load_model(self, model_name: str) -> str:
|
| 118 |
+
"""Load a model by its display name. Returns status message."""
|
| 119 |
+
if model_name not in AVAILABLE_MODELS:
|
| 120 |
+
return f"Unknown model: {model_name}"
|
| 121 |
+
|
| 122 |
+
if self.loading:
|
| 123 |
+
return "A model is already being loaded. Please wait."
|
| 124 |
+
|
| 125 |
+
spec = AVAILABLE_MODELS[model_name]
|
| 126 |
+
|
| 127 |
+
with self._lock:
|
| 128 |
+
self.loading = True
|
| 129 |
+
try:
|
| 130 |
+
# Unload current model
|
| 131 |
+
self._unload()
|
| 132 |
+
|
| 133 |
+
# Determine load kwargs
|
| 134 |
+
model_id = spec["id"]
|
| 135 |
+
load_kwargs: dict = {"device_map": "auto"}
|
| 136 |
+
|
| 137 |
+
if spec.get("quantize") == "4bit":
|
| 138 |
+
from transformers import BitsAndBytesConfig
|
| 139 |
+
load_kwargs["quantization_config"] = BitsAndBytesConfig(
|
| 140 |
+
load_in_4bit=True,
|
| 141 |
+
bnb_4bit_compute_dtype=torch.float16,
|
| 142 |
+
)
|
| 143 |
+
elif spec.get("quantize") == "8bit":
|
| 144 |
+
from transformers import BitsAndBytesConfig
|
| 145 |
+
load_kwargs["quantization_config"] = BitsAndBytesConfig(
|
| 146 |
+
load_in_8bit=True,
|
| 147 |
+
)
|
| 148 |
+
else:
|
| 149 |
+
dtype_str = spec.get("dtype", "float16")
|
| 150 |
+
if dtype_str == "auto":
|
| 151 |
+
load_kwargs["torch_dtype"] = "auto"
|
| 152 |
+
else:
|
| 153 |
+
load_kwargs["torch_dtype"] = getattr(torch, dtype_str)
|
| 154 |
+
|
| 155 |
+
# Load tokenizer + model
|
| 156 |
+
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 157 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 158 |
+
model_id, **load_kwargs
|
| 159 |
+
)
|
| 160 |
+
self.model.eval()
|
| 161 |
+
self.current_model_name = model_name
|
| 162 |
+
|
| 163 |
+
# Persist choice
|
| 164 |
+
self.config["model"] = model_name
|
| 165 |
+
_save_config(self.config)
|
| 166 |
+
|
| 167 |
+
return f"Loaded {model_name} ({model_id})"
|
| 168 |
+
|
| 169 |
+
except Exception as e:
|
| 170 |
+
self._unload()
|
| 171 |
+
return f"Failed to load {model_name}: {e}"
|
| 172 |
+
finally:
|
| 173 |
+
self.loading = False
|
| 174 |
+
|
| 175 |
+
def _unload(self) -> None:
|
| 176 |
+
"""Release current model and free memory."""
|
| 177 |
+
if self.model is not None:
|
| 178 |
+
del self.model
|
| 179 |
+
self.model = None
|
| 180 |
+
if self.tokenizer is not None:
|
| 181 |
+
del self.tokenizer
|
| 182 |
+
self.tokenizer = None
|
| 183 |
+
self.current_model_name = None
|
| 184 |
+
gc.collect()
|
| 185 |
+
if torch.cuda.is_available():
|
| 186 |
+
torch.cuda.empty_cache()
|
| 187 |
+
|
| 188 |
+
def is_ready(self) -> bool:
|
| 189 |
+
return self.model is not None and not self.loading
|
| 190 |
+
|
| 191 |
+
def status_message(self) -> str:
|
| 192 |
+
if self.loading:
|
| 193 |
+
return "Loading model..."
|
| 194 |
+
if self.model is None:
|
| 195 |
+
return "No model loaded"
|
| 196 |
+
return f"Model: {self.current_model_name}"
|
| 197 |
+
|
| 198 |
+
# ------------------------------------------------------------------
|
| 199 |
+
# Inference helpers
|
| 200 |
+
# ------------------------------------------------------------------
|
| 201 |
+
|
| 202 |
+
def _get_logits(self, text: str) -> torch.Tensor:
|
| 203 |
+
"""Run a forward pass and return logits for the last token position."""
|
| 204 |
+
inputs = self.tokenizer(text, return_tensors="pt")
|
| 205 |
+
inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
|
| 206 |
+
with torch.no_grad():
|
| 207 |
+
out = self.model(**inputs)
|
| 208 |
+
return out.logits[0, -1, :] # (vocab_size,)
|
| 209 |
+
|
| 210 |
+
@staticmethod
|
| 211 |
+
def apply_temperature(logits: torch.Tensor, temperature: float) -> torch.Tensor:
|
| 212 |
+
"""Apply temperature scaling to logits and return probabilities."""
|
| 213 |
+
if temperature <= 0:
|
| 214 |
+
temperature = 1e-6
|
| 215 |
+
scaled = logits / temperature
|
| 216 |
+
return torch.softmax(scaled, dim=-1)
|
| 217 |
+
|
| 218 |
+
@staticmethod
|
| 219 |
+
def entropy_bits(probs: torch.Tensor) -> float:
|
| 220 |
+
"""Shannon entropy in bits."""
|
| 221 |
+
eps = 1e-20
|
| 222 |
+
p = probs + eps
|
| 223 |
+
return float(-torch.sum(p * torch.log2(p)))
|
| 224 |
+
|
| 225 |
+
def top_k_table(
|
| 226 |
+
self, probs: torch.Tensor, k: int = 10
|
| 227 |
+
) -> list[tuple[str, float, int]]:
|
| 228 |
+
"""Return list of (token_str, probability, token_id) for top-k tokens."""
|
| 229 |
+
topk = torch.topk(probs, k=min(k, probs.shape[0]))
|
| 230 |
+
rows = []
|
| 231 |
+
for prob, idx in zip(topk.values.tolist(), topk.indices.tolist()):
|
| 232 |
+
token_str = self.tokenizer.decode([idx])
|
| 233 |
+
rows.append((token_str, float(prob), int(idx)))
|
| 234 |
+
return rows
|
| 235 |
+
|
| 236 |
+
# ------------------------------------------------------------------
|
| 237 |
+
# High-level generation
|
| 238 |
+
# ------------------------------------------------------------------
|
| 239 |
+
|
| 240 |
+
def generate_step_by_step(
|
| 241 |
+
self,
|
| 242 |
+
prompt: str,
|
| 243 |
+
steps: int = 8,
|
| 244 |
+
temperature: float = 0.8,
|
| 245 |
+
top_n: int = 10,
|
| 246 |
+
seed: int = 42,
|
| 247 |
+
show_steps: bool = True,
|
| 248 |
+
) -> list[dict]:
|
| 249 |
+
"""Generate tokens one at a time, returning per-step data.
|
| 250 |
+
|
| 251 |
+
Each step dict contains:
|
| 252 |
+
- step: int (1-based)
|
| 253 |
+
- text: accumulated text so far
|
| 254 |
+
- token: the sampled token string
|
| 255 |
+
- token_id: int
|
| 256 |
+
- entropy: float (bits)
|
| 257 |
+
- top_tokens: list of (token_str, prob, token_id)
|
| 258 |
+
"""
|
| 259 |
+
if not self.is_ready():
|
| 260 |
+
return []
|
| 261 |
+
|
| 262 |
+
text = prompt
|
| 263 |
+
results = []
|
| 264 |
+
rng = torch.Generator()
|
| 265 |
+
|
| 266 |
+
for i in range(steps):
|
| 267 |
+
logits = self._get_logits(text)
|
| 268 |
+
probs = self.apply_temperature(logits, temperature)
|
| 269 |
+
entropy = self.entropy_bits(probs)
|
| 270 |
+
top_tokens = self.top_k_table(probs, k=top_n) if show_steps else []
|
| 271 |
+
|
| 272 |
+
# Sample with deterministic seed per step
|
| 273 |
+
rng.manual_seed(seed + i)
|
| 274 |
+
idx = torch.multinomial(probs.cpu(), num_samples=1, generator=rng).item()
|
| 275 |
+
token_str = self.tokenizer.decode([idx])
|
| 276 |
+
text += token_str
|
| 277 |
+
|
| 278 |
+
results.append({
|
| 279 |
+
"step": i + 1,
|
| 280 |
+
"text": text,
|
| 281 |
+
"token": token_str,
|
| 282 |
+
"token_id": int(idx),
|
| 283 |
+
"entropy": entropy,
|
| 284 |
+
"top_tokens": top_tokens,
|
| 285 |
+
})
|
| 286 |
+
|
| 287 |
+
return results
|
| 288 |
+
|
| 289 |
+
def generate_text(
|
| 290 |
+
self,
|
| 291 |
+
prompt: str,
|
| 292 |
+
num_tokens: int = 30,
|
| 293 |
+
temperature: float = 0.8,
|
| 294 |
+
top_k: int = 40,
|
| 295 |
+
seed: int = 42,
|
| 296 |
+
) -> str:
|
| 297 |
+
"""Generate text with top-k sampling. Returns prompt + generated text."""
|
| 298 |
+
if not self.is_ready():
|
| 299 |
+
return prompt
|
| 300 |
+
|
| 301 |
+
text = prompt
|
| 302 |
+
rng = torch.Generator()
|
| 303 |
+
|
| 304 |
+
for i in range(num_tokens):
|
| 305 |
+
logits = self._get_logits(text)
|
| 306 |
+
|
| 307 |
+
# Apply top-k filtering
|
| 308 |
+
if top_k > 0:
|
| 309 |
+
top_k_vals, top_k_idxs = torch.topk(logits, k=min(top_k, logits.shape[0]))
|
| 310 |
+
mask = torch.full_like(logits, float("-inf"))
|
| 311 |
+
mask.scatter_(0, top_k_idxs, top_k_vals)
|
| 312 |
+
logits = mask
|
| 313 |
+
|
| 314 |
+
probs = self.apply_temperature(logits, temperature)
|
| 315 |
+
|
| 316 |
+
rng.manual_seed(seed + i)
|
| 317 |
+
idx = torch.multinomial(probs.cpu(), num_samples=1, generator=rng).item()
|
| 318 |
+
token_str = self.tokenizer.decode([idx])
|
| 319 |
+
text += token_str
|
| 320 |
+
|
| 321 |
+
return text
|
| 322 |
+
|
| 323 |
+
def tokenize(self, text: str) -> list[tuple[str, int]]:
|
| 324 |
+
"""Tokenize text and return list of (token_str, token_id)."""
|
| 325 |
+
if self.tokenizer is None:
|
| 326 |
+
return []
|
| 327 |
+
ids = self.tokenizer.encode(text)
|
| 328 |
+
return [(self.tokenizer.decode([tid]), tid) for tid in ids]
|
| 329 |
+
|
| 330 |
+
# ------------------------------------------------------------------
|
| 331 |
+
# Config helpers
|
| 332 |
+
# ------------------------------------------------------------------
|
| 333 |
+
|
| 334 |
+
def get_config(self) -> dict:
|
| 335 |
+
return dict(self.config)
|
| 336 |
+
|
| 337 |
+
def update_config(self, **kwargs) -> None:
|
| 338 |
+
self.config.update(kwargs)
|
| 339 |
+
_save_config(self.config)
|
| 340 |
+
|
| 341 |
+
|
| 342 |
+
# Module-level singleton
|
| 343 |
+
manager = ModelManager()
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
torch>=2.1.0
|
| 2 |
+
transformers>=4.40.0
|
| 3 |
+
accelerate>=0.27.0
|
| 4 |
+
bitsandbytes>=0.43.0
|
| 5 |
+
gradio>=4.20.0
|
| 6 |
+
sentencepiece>=0.2.0
|
| 7 |
+
protobuf>=4.25.0
|