Restore 3-panel layout: vanilla / brief / rich with microlens-final LoRA
Browse filesReattach the fine-tuned LoRA (now Laborator/microlens-final, single adapter
instead of v2/v3 subfolders) and rebuild the 3-result-panel UI:
- UNTRAINED BASELINE: stock Gemma 4 E2B with adapter disabled
- MICROLENS BRIEF: base + LoRA + BRIEF_PROMPT (one-sentence answer)
- MICROLENS RICH: base + LoRA + RICH_PROMPT (full schema)
Brief and rich share the same microlens-final adapter and differ only in the
prompt and the max_new_tokens budget (96 vs 512).
- _HF_LORA_REPO → Laborator/microlens-final
- _zerogpu_infer_all: per-version prompt, single adapter swap
- PANEL_THEMES["v2"/"v3"]: rename titles to MICROLENS BRIEF/RICH, update
subtitle to reflect 95 genera / single-LoRA architecture
- APK QR fallback → GitHub repo
- README: drop base-only Status, add models: Laborator/microlens-final,
switch genus claim to 95 (diatoms + fungal spores)
|
@@ -8,7 +8,9 @@ sdk_version: 5.7.1
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
-
short_description:
|
|
|
|
|
|
|
| 12 |
tags:
|
| 13 |
- microscopy
|
| 14 |
- biology
|
|
@@ -21,25 +23,23 @@ tags:
|
|
| 21 |
|
| 22 |
# 🔬 MicroLens — Microscopy AI Demo
|
| 23 |
|
| 24 |
-
Live demo of **MicroLens**: a
|
| 25 |
|
| 26 |
Built by **Serghei Brinza** (Vienna, Austria) for the **Kaggle Gemma 4 Good Hackathon** (May 2026).
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
-
## Status
|
| 31 |
-
|
| 32 |
-
⚠️ Currently running on stock **Gemma 4 E2B** only — the fine-tuned LoRA is in training (expected restoration after 2026-05-17). Accuracy on microscopy genera will be limited until the LoRA is loaded.
|
| 33 |
-
|
| 34 |
-
---
|
| 35 |
-
|
| 36 |
## 🚀 How to use
|
| 37 |
|
| 38 |
1. Upload a microscopy image (or pick one from the examples gallery).
|
| 39 |
2. Optionally add a question, for example: "What is shown here?", "Describe the morphology", "What category does this belong to?"
|
| 40 |
3. Click **AI Analyze**.
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
@@ -64,6 +64,7 @@ For any vertical that requires regulatory clearance (medicine, veterinary, foren
|
|
| 64 |
|
| 65 |
## 🔗 Links
|
| 66 |
|
|
|
|
| 67 |
- **Source code:** [SergheiBrinza/microlens](https://github.com/SergheiBrinza/microlens)
|
| 68 |
- **Hackathon writeup:** [KAGGLE_WRITEUP.md](https://github.com/SergheiBrinza/microlens/blob/main/KAGGLE_WRITEUP.md)
|
| 69 |
- **Roadmap:** [ROADMAP.md](https://github.com/SergheiBrinza/microlens/blob/main/ROADMAP.md)
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
+
short_description: Gemma 4 E2B + LoRA · diatoms & fungal spores · 95 genera
|
| 12 |
+
models:
|
| 13 |
+
- Laborator/microlens-final
|
| 14 |
tags:
|
| 15 |
- microscopy
|
| 16 |
- biology
|
|
|
|
| 23 |
|
| 24 |
# 🔬 MicroLens — Microscopy AI Demo
|
| 25 |
|
| 26 |
+
Live demo of **MicroLens**: a fine-tuned **Gemma 4 E2B** vision-language model that identifies microscopic subjects across **2 categories** (diatoms and fungal spores) and **95 genera**.
|
| 27 |
|
| 28 |
Built by **Serghei Brinza** (Vienna, Austria) for the **Kaggle Gemma 4 Good Hackathon** (May 2026).
|
| 29 |
|
| 30 |
---
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
## 🚀 How to use
|
| 33 |
|
| 34 |
1. Upload a microscopy image (or pick one from the examples gallery).
|
| 35 |
2. Optionally add a question, for example: "What is shown here?", "Describe the morphology", "What category does this belong to?"
|
| 36 |
3. Click **AI Analyze**.
|
| 37 |
|
| 38 |
+
You get three side-by-side answers:
|
| 39 |
+
|
| 40 |
+
- **UNTRAINED BASELINE** — stock Gemma 4 E2B with no microscopy training.
|
| 41 |
+
- **MICROLENS · BRIEF** — same base + `Laborator/microlens-final` LoRA, prompted for a single-sentence genus answer.
|
| 42 |
+
- **MICROLENS · RICH** — same base + same LoRA, prompted for the full schema (genus + morphology + habitat + identification cues).
|
| 43 |
|
| 44 |
---
|
| 45 |
|
|
|
|
| 64 |
|
| 65 |
## 🔗 Links
|
| 66 |
|
| 67 |
+
- **Model:** [Laborator/microlens-final](https://huggingface.co/Laborator/microlens-final)
|
| 68 |
- **Source code:** [SergheiBrinza/microlens](https://github.com/SergheiBrinza/microlens)
|
| 69 |
- **Hackathon writeup:** [KAGGLE_WRITEUP.md](https://github.com/SergheiBrinza/microlens/blob/main/KAGGLE_WRITEUP.md)
|
| 70 |
- **Roadmap:** [ROADMAP.md](https://github.com/SergheiBrinza/microlens/blob/main/ROADMAP.md)
|
|
@@ -8,14 +8,17 @@ Layout:
|
|
| 8 |
control panel (mode-dependent: 5 categories × 6 thumbs / upload zone /
|
| 9 |
camera enumeration)
|
| 10 |
- AI ANALYZE long oval cyan→red gradient button
|
|
|
|
| 11 |
- Translate row with 28 languages (English default) + ORIGINAL button after
|
| 12 |
translation
|
| 13 |
- Footer with run-locally + APK + Legal links
|
| 14 |
|
| 15 |
-
SAMPLES tab uses cached
|
| 16 |
-
UPLOAD / MICROSCOPE tabs run LIVE inference against
|
| 17 |
URL_VANILLA (default http://127.0.0.1:8085/v1/chat/completions)
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
(e.g. Cloudflare → llama-server). When unreachable the panel shows a clean
|
| 20 |
"backend unavailable" message instead of crashing.
|
| 21 |
"""
|
|
@@ -86,17 +89,22 @@ CATALOG: List[Dict] = json.loads(CATALOG_PATH.read_text())
|
|
| 86 |
BY_FILENAME = {s["filename"]: s for s in CATALOG}
|
| 87 |
|
| 88 |
URL_VANILLA = os.environ.get("URL_VANILLA", "http://127.0.0.1:8085/v1/chat/completions")
|
|
|
|
|
|
|
| 89 |
INFERENCE_PROMPT = "What is shown in this microscope image?"
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 92 |
# ZeroGPU runtime: when running on HF Space we replace HTTP llama-server calls
|
| 93 |
-
# with in-process transformers
|
| 94 |
-
#
|
| 95 |
-
# attached. Outside HF Space (local dev) the original HTTP path is preserved.
|
| 96 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 97 |
IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
|
| 98 |
|
| 99 |
_HF_BASE = "unsloth/gemma-4-E2B-it"
|
|
|
|
| 100 |
|
| 101 |
_zerogpu_processor = None
|
| 102 |
_zerogpu_model = None
|
|
@@ -105,23 +113,96 @@ if IS_HF_SPACE:
|
|
| 105 |
import spaces
|
| 106 |
import torch
|
| 107 |
from transformers import AutoProcessor, AutoModelForImageTextToText
|
|
|
|
| 108 |
|
| 109 |
print("[ZeroGPU] loading processor + base model on cuda…", flush=True)
|
| 110 |
_zerogpu_processor = AutoProcessor.from_pretrained(_HF_BASE)
|
| 111 |
_zerogpu_model = AutoModelForImageTextToText.from_pretrained(
|
| 112 |
_HF_BASE, torch_dtype=torch.bfloat16, device_map="cuda",
|
| 113 |
)
|
| 114 |
-
_zerogpu_model.eval()
|
| 115 |
-
print("[ZeroGPU] ready (base Gemma 4 E2B only — LoRA in training)", flush=True)
|
| 116 |
|
| 117 |
-
#
|
| 118 |
-
#
|
| 119 |
-
#
|
| 120 |
-
|
| 121 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
import time as _t
|
| 123 |
t0 = _t.time()
|
| 124 |
-
print(f"[infer] cuda={torch.cuda.is_available()} "
|
| 125 |
f"dev={torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'cpu'}",
|
| 126 |
flush=True)
|
| 127 |
b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
|
|
@@ -129,9 +210,17 @@ if IS_HF_SPACE:
|
|
| 129 |
if max(img.size) > 768:
|
| 130 |
img.thumbnail((768, 768))
|
| 131 |
print(f"[infer] image {img.size}", flush=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
messages = [{"role": "user", "content": [
|
| 133 |
{"type": "image", "image": img},
|
| 134 |
-
{"type": "text", "text":
|
| 135 |
]}]
|
| 136 |
inputs = _zerogpu_processor.apply_chat_template(
|
| 137 |
messages, add_generation_prompt=True, tokenize=True,
|
|
@@ -144,7 +233,7 @@ if IS_HF_SPACE:
|
|
| 144 |
print(f"[infer] inputs ready, t+{_t.time()-t0:.2f}s, generating…", flush=True)
|
| 145 |
with torch.inference_mode():
|
| 146 |
out = _zerogpu_model.generate(
|
| 147 |
-
**inputs, max_new_tokens=
|
| 148 |
)
|
| 149 |
prompt_len = inputs["input_ids"].shape[1]
|
| 150 |
gen_ids = out[0][prompt_len:]
|
|
@@ -153,14 +242,14 @@ if IS_HF_SPACE:
|
|
| 153 |
f"text_len={len(text)}, preview={text[:80]!r}", flush=True)
|
| 154 |
return text.strip()
|
| 155 |
|
|
|
|
|
|
|
| 156 |
|
| 157 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 158 |
# QR codes for the footer install card. Generated once at module load.
|
| 159 |
# ─────────────────────────────────────────────────────────────────────────────
|
|
|
|
| 160 |
GITHUB_URL = "https://github.com/SergheiBrinza/microlens"
|
| 161 |
-
# APK fallback now points at the GitHub repo (the previous resolve/main URL
|
| 162 |
-
# pointed at the deleted Laborator/microlens-gemma4-e2b LoRA repo).
|
| 163 |
-
APK_URL = GITHUB_URL
|
| 164 |
|
| 165 |
def _qr_data_uri(data: str, dark: str = "#FFFFFF", light: str = "#000000",
|
| 166 |
alpha: float = 1.0) -> str:
|
|
@@ -235,11 +324,12 @@ def llama_server_call(url: str, image_data_uri: str,
|
|
| 235 |
prompt: str = INFERENCE_PROMPT,
|
| 236 |
timeout: int = 180) -> Tuple[str, Optional[str]]:
|
| 237 |
"""Returns (text, error_or_None).
|
| 238 |
-
On HF Space: routes to in-process ZeroGPU inference (transformers).
|
| 239 |
Locally: OpenAI-compatible call to llama-server (original behavior)."""
|
| 240 |
if IS_HF_SPACE:
|
|
|
|
| 241 |
try:
|
| 242 |
-
return _zerogpu_infer(image_data_uri, prompt), None
|
| 243 |
except Exception as e:
|
| 244 |
return "", f"{type(e).__name__}: {str(e)[:240]}"
|
| 245 |
payload = {
|
|
@@ -1141,8 +1231,8 @@ async () => {
|
|
| 1141 |
|
| 1142 |
PANEL_THEMES = {
|
| 1143 |
"vanilla": {
|
| 1144 |
-
"title": "
|
| 1145 |
-
"subtitle": "Stock Gemma 4 E2B · Google factory weights ·
|
| 1146 |
"stripe": "linear-gradient(90deg, #C0C5CC 0%, #7A7E85 100%)",
|
| 1147 |
"title_grad": "linear-gradient(90deg, #E0E5EC 0%, #9A9EA5 100%)",
|
| 1148 |
"border": "rgba(180,185,195,0.35)",
|
|
@@ -1150,6 +1240,26 @@ PANEL_THEMES = {
|
|
| 1150 |
"glow_strong": "0 0 56px rgba(200,205,215,0.28)",
|
| 1151 |
"subtitle_color": "#9aa0a8",
|
| 1152 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1153 |
}
|
| 1154 |
|
| 1155 |
|
|
@@ -1206,29 +1316,43 @@ def panel_html(kind: str, body: str, state: str = "ready", footer_text: Optional
|
|
| 1206 |
"""
|
| 1207 |
|
| 1208 |
|
| 1209 |
-
def empty_panels(reason: str = "empty") -> str:
|
| 1210 |
-
return panel_html("vanilla", "", state=reason)
|
|
|
|
|
|
|
| 1211 |
|
| 1212 |
|
| 1213 |
def analyse_curated(filename: str, shape: str, grid: int = 0, cross: int = 0):
|
| 1214 |
import time
|
| 1215 |
s = BY_FILENAME.get(filename)
|
| 1216 |
if not s:
|
| 1217 |
-
yield viewport_html(None, shape, grid, cross), empty_panels()
|
| 1218 |
return
|
| 1219 |
vp = viewport_html(full_uri(filename), shape, grid, cross)
|
| 1220 |
vanilla_full = s.get("vanilla_answer", "—")
|
| 1221 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1222 |
step = 8
|
| 1223 |
delay = 0.040
|
| 1224 |
-
for i in range(step,
|
| 1225 |
yield (
|
| 1226 |
vp,
|
| 1227 |
panel_html("vanilla", vanilla_full[:min(i, len(vanilla_full))],
|
| 1228 |
state="typing" if i < len(vanilla_full) else "ready"),
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1229 |
)
|
| 1230 |
time.sleep(delay)
|
| 1231 |
-
yield vp,
|
|
|
|
|
|
|
|
|
|
| 1232 |
|
| 1233 |
|
| 1234 |
CSS = """
|
|
@@ -1560,7 +1684,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1560 |
font-weight:700; font-size:12px; letter-spacing:3px;
|
| 1561 |
text-transform:uppercase;">Fine-tune</span>
|
| 1562 |
<span style="font-family:'Fraunces',serif; font-weight:500;
|
| 1563 |
-
color:#fff; font-size:19px; letter-spacing:0.3px;">Unsloth QLoRA ·
|
| 1564 |
</span>
|
| 1565 |
<span style="color:#3a3a3a; font-size:18px;">·</span>
|
| 1566 |
<span style="display:inline-flex; align-items:baseline; gap:12px;
|
|
@@ -1584,8 +1708,8 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1584 |
cross_state = gr.Textbox(value="0", elem_id="hidden-cross",
|
| 1585 |
elem_classes=["ml-hidden"], show_label=False)
|
| 1586 |
viewport_uri = gr.State(value="")
|
| 1587 |
-
# Most recent
|
| 1588 |
-
last_answers = gr.State(value={"vanilla": ""})
|
| 1589 |
|
| 1590 |
# Toolbar — full-width above both columns (no empty space in right column)
|
| 1591 |
mode_buttons = gr.HTML(value=mode_buttons_html(MODE_SAMPLES))
|
|
@@ -1650,11 +1774,10 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1650 |
|
| 1651 |
gr.HTML('<div style="height:28px;"></div>')
|
| 1652 |
|
| 1653 |
-
with gr.Row(elem_classes=["equal-panels"]):
|
| 1654 |
-
gr.HTML(
|
| 1655 |
-
|
| 1656 |
-
|
| 1657 |
-
gr.HTML('<div></div>')
|
| 1658 |
|
| 1659 |
gr.HTML(f"""
|
| 1660 |
<div style="margin-top: 32px; padding: 22px 28px;
|
|
@@ -1690,7 +1813,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1690 |
<div style="color:#e4e4e4; font-size:13px; line-height:1.85; font-weight:500;">
|
| 1691 |
Gemma 4 E2B-it <span style="color:#666;font-weight:400;">·</span> Google DeepMind<br>
|
| 1692 |
Unsloth FastVisionModel <span style="color:#666;font-weight:400;">·</span> 4-bit QLoRA<br>
|
| 1693 |
-
|
| 1694 |
llama.cpp + mtmd vision extension
|
| 1695 |
</div>
|
| 1696 |
</div>
|
|
@@ -1706,7 +1829,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1706 |
font-size:12.5px; letter-spacing:0.3px;
|
| 1707 |
border-bottom:1px solid rgba(127,232,227,.40);
|
| 1708 |
display:inline-block; margin-bottom: 8px;">
|
| 1709 |
-
🦙
|
| 1710 |
<br>
|
| 1711 |
<a href="https://github.com/SergheiBrinza/microlens"
|
| 1712 |
target="_blank" rel="noopener"
|
|
@@ -1773,6 +1896,8 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1773 |
|
| 1774 |
LIVE_BACKENDS = [
|
| 1775 |
("vanilla", URL_VANILLA, "Gemma 4 E2B · base"),
|
|
|
|
|
|
|
| 1776 |
]
|
| 1777 |
|
| 1778 |
def render_tools(current_uri, shape, grid_str, cross_str, mode):
|
|
@@ -1810,14 +1935,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1810 |
gr.Group(visible=(mode == MODE_SAMPLES)),
|
| 1811 |
gr.Group(visible=(mode == MODE_UPLOAD)),
|
| 1812 |
gr.Group(visible=(mode == MODE_MICRO)),
|
| 1813 |
-
vp, uri, empty_panels(),
|
| 1814 |
gr.Button(visible=False))
|
| 1815 |
|
| 1816 |
mode_state.change(on_mode_change,
|
| 1817 |
[mode_state, shape_state, grid_state, cross_state, picked_filename],
|
| 1818 |
[mode_buttons, samples_group, upload_group, micro_group,
|
| 1819 |
viewport, viewport_uri,
|
| 1820 |
-
vanilla_panel, original_btn], api_name=False)
|
| 1821 |
|
| 1822 |
def on_cat_change(cat_label, current_filename, shape, grid_str, cross_str):
|
| 1823 |
try: grid = int(grid_str or "0")
|
|
@@ -1828,14 +1953,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1828 |
folder_html(cat_label, None),
|
| 1829 |
viewport_html(None, shape, grid, cross,
|
| 1830 |
empty_text="PICK A SAMPLE FROM THE CATEGORY ABOVE"),
|
| 1831 |
-
"", "", empty_panels(),
|
| 1832 |
gr.Button(visible=False),
|
| 1833 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 1834 |
|
| 1835 |
cat_state.change(on_cat_change,
|
| 1836 |
[cat_state, picked_filename, shape_state, grid_state, cross_state],
|
| 1837 |
[folder_pills, folder_grid, viewport, picked_filename, viewport_uri,
|
| 1838 |
-
vanilla_panel, original_btn, lang_dropdown],
|
| 1839 |
api_name=False)
|
| 1840 |
|
| 1841 |
def on_pick(filename, cat_label, shape, grid_str, cross_str):
|
|
@@ -1846,23 +1971,23 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1846 |
# Reset live-answer state on every sample switch — without this the
|
| 1847 |
# previous image's live answer could leak into translate/restore for
|
| 1848 |
# the next sample and look like a real result.
|
| 1849 |
-
cleared_state = {"vanilla": ""}
|
| 1850 |
if not filename:
|
| 1851 |
return (folder_html(cat_label, None),
|
| 1852 |
-
viewport_html(None, shape, grid, cross), "", empty_panels(),
|
| 1853 |
gr.Button(visible=False),
|
| 1854 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
|
| 1855 |
cleared_state)
|
| 1856 |
uri = full_uri(filename)
|
| 1857 |
return (folder_html(cat_label, filename),
|
| 1858 |
-
viewport_html(uri, shape, grid, cross), uri, empty_panels(),
|
| 1859 |
gr.Button(visible=False),
|
| 1860 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
|
| 1861 |
cleared_state)
|
| 1862 |
|
| 1863 |
picked_filename.change(on_pick,
|
| 1864 |
[picked_filename, cat_state, shape_state, grid_state, cross_state],
|
| 1865 |
-
[folder_grid, viewport, viewport_uri, vanilla_panel,
|
| 1866 |
original_btn, lang_dropdown, last_answers], api_name=False)
|
| 1867 |
|
| 1868 |
def on_file_upload(file_obj, shape, grid_str, cross_str):
|
|
@@ -1907,9 +2032,9 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1907 |
return panel_html(kind, body, state="ready", footer_text=f"❌ {label}")
|
| 1908 |
|
| 1909 |
def do_analyze(filename, shape, mode, grid_str, cross_str, current_uri):
|
| 1910 |
-
"""Unified live inference for ALL modes.
|
| 1911 |
-
|
| 1912 |
-
uploads, and webcam captures."""
|
| 1913 |
try: grid = int(grid_str or "0")
|
| 1914 |
except ValueError: grid = 0
|
| 1915 |
try: cross = int(cross_str or "0")
|
|
@@ -1933,8 +2058,10 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1933 |
"or capture from your camera, then press AI ANALYZE.")
|
| 1934 |
yield (viewport_html(None, shape, grid, cross, live_video=live),
|
| 1935 |
panel_html("vanilla", msg, state="ready"),
|
|
|
|
|
|
|
| 1936 |
gr.Button(visible=False),
|
| 1937 |
-
{"vanilla": ""})
|
| 1938 |
return
|
| 1939 |
|
| 1940 |
source = ("webcam" if mode == MODE_MICRO else
|
|
@@ -1944,20 +2071,27 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1944 |
running = f"⏳ Running on your {source}…"
|
| 1945 |
yield (vp,
|
| 1946 |
panel_html("vanilla", running, state="typing"),
|
|
|
|
|
|
|
| 1947 |
gr.Button(visible=False),
|
| 1948 |
-
{"vanilla": ""})
|
| 1949 |
|
| 1950 |
-
|
|
|
|
| 1951 |
|
| 1952 |
-
# On HF Space:
|
| 1953 |
-
# Locally we keep the HTTP
|
| 1954 |
if IS_HF_SPACE:
|
| 1955 |
try:
|
| 1956 |
-
|
|
|
|
|
|
|
| 1957 |
except Exception as e:
|
| 1958 |
err = f"{type(e).__name__}: {str(e)[:280]}"
|
| 1959 |
yield (vp,
|
| 1960 |
_error_panel("vanilla", "Gemma 4 E2B · base", err),
|
|
|
|
|
|
|
| 1961 |
gr.Button(visible=False),
|
| 1962 |
answers)
|
| 1963 |
return
|
|
@@ -1981,19 +2115,30 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 1981 |
f'<span class="ml-word" style="animation-delay:{delay}ms;">{safe}</span>'
|
| 1982 |
)
|
| 1983 |
return "".join(spans)
|
| 1984 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1985 |
yield (vp,
|
| 1986 |
panel_html("vanilla", _animated_words(answers["vanilla"]),
|
| 1987 |
-
state="ready", footer_text=
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1988 |
gr.Button(visible=False),
|
| 1989 |
answers)
|
| 1990 |
else:
|
| 1991 |
-
# Local: HTTP
|
| 1992 |
for kind, url, label in LIVE_BACKENDS:
|
| 1993 |
ans, err = llama_server_call(url, img_uri)
|
| 1994 |
if err:
|
|
|
|
| 1995 |
yield (vp,
|
| 1996 |
-
|
|
|
|
|
|
|
| 1997 |
gr.Button(visible=False),
|
| 1998 |
answers)
|
| 1999 |
else:
|
|
@@ -2004,11 +2149,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 2004 |
for i in range(step, len(ans) + step, step):
|
| 2005 |
partial = ans[:min(i, len(ans))]
|
| 2006 |
is_done = i >= len(ans)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2007 |
yield (vp,
|
| 2008 |
-
panel_html(
|
| 2009 |
-
|
| 2010 |
-
|
| 2011 |
-
footer_text=footer if is_done else None),
|
| 2012 |
gr.Button(visible=False),
|
| 2013 |
answers)
|
| 2014 |
time.sleep(delay)
|
|
@@ -2037,7 +2185,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 2037 |
"""
|
| 2038 |
analyze_btn.click(do_analyze,
|
| 2039 |
[picked_filename, shape_state, mode_state, grid_state, cross_state, viewport_uri],
|
| 2040 |
-
[viewport, vanilla_panel, original_btn, last_answers],
|
| 2041 |
js=ANALYZE_PRE_JS, api_name=False)
|
| 2042 |
|
| 2043 |
def do_translate(filename, lang_label, answers):
|
|
@@ -2051,12 +2199,16 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 2051 |
if not sources:
|
| 2052 |
msg = "Run AI ANALYZE first to get an answer to translate."
|
| 2053 |
return (panel_html("vanilla", msg, state="ready"),
|
|
|
|
|
|
|
| 2054 |
gr.Button(visible=False))
|
| 2055 |
|
| 2056 |
lang_code = LANG_BY_DISPLAY.get(lang_label, "en")
|
| 2057 |
lang_name = next((name for _, name, code in LANGUAGES if code == lang_code), "English")
|
| 2058 |
if lang_code == "en":
|
| 2059 |
return (panel_html("vanilla", sources.get("vanilla", "")),
|
|
|
|
|
|
|
| 2060 |
gr.Button(visible=False))
|
| 2061 |
translated = {}
|
| 2062 |
engine = ""
|
|
@@ -2091,14 +2243,18 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 2091 |
if not translated or not any(translated.values()):
|
| 2092 |
placeholder = f"Translation to {lang_name} unavailable right now."
|
| 2093 |
return (panel_html("vanilla", placeholder, state="ready"),
|
|
|
|
|
|
|
| 2094 |
gr.Button(visible=False))
|
| 2095 |
footer = f"🌍 {lang_name} · {engine}"
|
| 2096 |
return (panel_html("vanilla", translated.get("vanilla", ""), footer_text=footer),
|
|
|
|
|
|
|
| 2097 |
gr.Button(visible=True))
|
| 2098 |
|
| 2099 |
translate_btn.click(do_translate,
|
| 2100 |
[picked_filename, lang_dropdown, last_answers],
|
| 2101 |
-
[vanilla_panel, original_btn], api_name=False)
|
| 2102 |
|
| 2103 |
def restore_original(filename, answers):
|
| 2104 |
# Restore ONLY the live answer that produced this translation. Same
|
|
@@ -2107,16 +2263,18 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
|
|
| 2107 |
# a pre-baked answer for a different image.
|
| 2108 |
sources = answers if (answers and any(answers.values())) else None
|
| 2109 |
if not sources:
|
| 2110 |
-
return (empty_panels(),
|
| 2111 |
gr.Button(visible=False),
|
| 2112 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 2113 |
return (panel_html("vanilla", sources.get("vanilla", "")),
|
|
|
|
|
|
|
| 2114 |
gr.Button(visible=False),
|
| 2115 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 2116 |
|
| 2117 |
original_btn.click(restore_original,
|
| 2118 |
[picked_filename, last_answers],
|
| 2119 |
-
[vanilla_panel, original_btn, lang_dropdown],
|
| 2120 |
api_name=False)
|
| 2121 |
|
| 2122 |
demo.load(fn=None, inputs=None, outputs=None, js=CAMERA_JS)
|
|
|
|
| 8 |
control panel (mode-dependent: 5 categories × 6 thumbs / upload zone /
|
| 9 |
camera enumeration)
|
| 10 |
- AI ANALYZE long oval cyan→red gradient button
|
| 11 |
+
- 3 result panels: UNTRAINED BASELINE / MICROLENS V2 BRIEF / MICROLENS V3 RICH
|
| 12 |
- Translate row with 28 languages (English default) + ORIGINAL button after
|
| 13 |
translation
|
| 14 |
- Footer with run-locally + APK + Legal links
|
| 15 |
|
| 16 |
+
SAMPLES tab uses cached answers from catalog.json (vanilla + v2 + v3 for all 30).
|
| 17 |
+
UPLOAD / MICROSCOPE tabs run LIVE inference against per-model backend URLs:
|
| 18 |
URL_VANILLA (default http://127.0.0.1:8085/v1/chat/completions)
|
| 19 |
+
URL_V2 (default http://127.0.0.1:8084/v1/chat/completions)
|
| 20 |
+
URL_V3 (default http://127.0.0.1:8083/v1/chat/completions)
|
| 21 |
+
On HF Space deployment configure these as Variables to point at a public tunnel
|
| 22 |
(e.g. Cloudflare → llama-server). When unreachable the panel shows a clean
|
| 23 |
"backend unavailable" message instead of crashing.
|
| 24 |
"""
|
|
|
|
| 89 |
BY_FILENAME = {s["filename"]: s for s in CATALOG}
|
| 90 |
|
| 91 |
URL_VANILLA = os.environ.get("URL_VANILLA", "http://127.0.0.1:8085/v1/chat/completions")
|
| 92 |
+
URL_V2 = os.environ.get("URL_V2", "http://127.0.0.1:8084/v1/chat/completions")
|
| 93 |
+
URL_V3 = os.environ.get("URL_V3", "http://127.0.0.1:8083/v1/chat/completions")
|
| 94 |
INFERENCE_PROMPT = "What is shown in this microscope image?"
|
| 95 |
+
BRIEF_PROMPT = "Identify the genus in this microscopy image. Reply with just the genus name and a one-sentence description."
|
| 96 |
+
RICH_PROMPT = "Identify the organism in this microscopy image. Describe the genus, morphology, habitat, and identification cues."
|
| 97 |
+
_PROMPT_BY_VERSION = {"vanilla": INFERENCE_PROMPT, "v2": BRIEF_PROMPT, "v3": RICH_PROMPT}
|
| 98 |
|
| 99 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 100 |
# ZeroGPU runtime: when running on HF Space we replace HTTP llama-server calls
|
| 101 |
+
# with in-process transformers + PEFT multi-adapter inference on H200.
|
| 102 |
+
# Outside HF Space (local dev) the original HTTP path is preserved.
|
|
|
|
| 103 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 104 |
IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
|
| 105 |
|
| 106 |
_HF_BASE = "unsloth/gemma-4-E2B-it"
|
| 107 |
+
_HF_LORA_REPO = "Laborator/microlens-final"
|
| 108 |
|
| 109 |
_zerogpu_processor = None
|
| 110 |
_zerogpu_model = None
|
|
|
|
| 113 |
import spaces
|
| 114 |
import torch
|
| 115 |
from transformers import AutoProcessor, AutoModelForImageTextToText
|
| 116 |
+
from peft import PeftModel
|
| 117 |
|
| 118 |
print("[ZeroGPU] loading processor + base model on cuda…", flush=True)
|
| 119 |
_zerogpu_processor = AutoProcessor.from_pretrained(_HF_BASE)
|
| 120 |
_zerogpu_model = AutoModelForImageTextToText.from_pretrained(
|
| 121 |
_HF_BASE, torch_dtype=torch.bfloat16, device_map="cuda",
|
| 122 |
)
|
|
|
|
|
|
|
| 123 |
|
| 124 |
+
# PEFT 0.19 cannot hook transformers' Gemma4ClippableLinear (vision tower
|
| 125 |
+
# wrapper around nn.Linear with opt-in clamping). The clamp thresholds
|
| 126 |
+
# default to ±inf so the wrapper is a behavioral no-op — replace each
|
| 127 |
+
# occurrence with its inner .linear so PEFT sees a plain nn.Linear.
|
| 128 |
+
def _unwrap_clippable(module):
|
| 129 |
+
from torch import nn
|
| 130 |
+
for name, child in list(module.named_children()):
|
| 131 |
+
if type(child).__name__ == "Gemma4ClippableLinear" and isinstance(
|
| 132 |
+
getattr(child, "linear", None), nn.Linear
|
| 133 |
+
):
|
| 134 |
+
if getattr(child, "use_clipped_linears", False):
|
| 135 |
+
print(f"[ZeroGPU] WARN: clipped-linears active on {name}; "
|
| 136 |
+
"unwrapping anyway (thresholds are ±inf = no-op)", flush=True)
|
| 137 |
+
setattr(module, name, child.linear)
|
| 138 |
+
else:
|
| 139 |
+
_unwrap_clippable(child)
|
| 140 |
+
_unwrap_clippable(_zerogpu_model)
|
| 141 |
+
|
| 142 |
+
print("[ZeroGPU] attaching microlens-final LoRA…", flush=True)
|
| 143 |
+
_zerogpu_model = PeftModel.from_pretrained(
|
| 144 |
+
_zerogpu_model, _HF_LORA_REPO, adapter_name="microlens",
|
| 145 |
+
)
|
| 146 |
+
_zerogpu_model.eval()
|
| 147 |
+
print("[ZeroGPU] ready (vanilla = base off / brief + rich = same LoRA, different prompts)", flush=True)
|
| 148 |
+
|
| 149 |
+
# ── Batch path: run vanilla + brief + rich in a SINGLE GPU acquisition.
|
| 150 |
+
# vanilla is the base Gemma 4 with adapter disabled; brief and rich share
|
| 151 |
+
# the same microlens-final LoRA but use different prompts (BRIEF_PROMPT /
|
| 152 |
+
# RICH_PROMPT). duration=60s gives headroom for all three to finish.
|
| 153 |
+
@spaces.GPU(duration=60)
|
| 154 |
+
def _zerogpu_infer_all(image_data_uri: str, prompt: str = None):
|
| 155 |
+
import time as _t
|
| 156 |
+
t_total = _t.time()
|
| 157 |
+
print(f"[infer-all] start cuda={torch.cuda.is_available()}", flush=True)
|
| 158 |
+
b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
|
| 159 |
+
img = Image.open(BytesIO(base64.b64decode(b64))).convert("RGB")
|
| 160 |
+
if max(img.size) > 768:
|
| 161 |
+
img.thumbnail((768, 768))
|
| 162 |
+
results = {}
|
| 163 |
+
for version in ("vanilla", "v2", "v3"):
|
| 164 |
+
t0 = _t.time()
|
| 165 |
+
version_prompt = _PROMPT_BY_VERSION[version]
|
| 166 |
+
if version == "vanilla":
|
| 167 |
+
_zerogpu_model.disable_adapter_layers()
|
| 168 |
+
# Stock Gemma can ramble up to 1400+ chars on a microscope image
|
| 169 |
+
# which blows the 60s ZeroGPU budget; cap it tighter.
|
| 170 |
+
_max_tok = 256
|
| 171 |
+
else:
|
| 172 |
+
_zerogpu_model.enable_adapter_layers()
|
| 173 |
+
_zerogpu_model.set_adapter("microlens")
|
| 174 |
+
# brief stays short, rich gets headroom for full schema answer.
|
| 175 |
+
_max_tok = 96 if version == "v2" else 512
|
| 176 |
+
messages = [{"role": "user", "content": [
|
| 177 |
+
{"type": "image", "image": img},
|
| 178 |
+
{"type": "text", "text": version_prompt},
|
| 179 |
+
]}]
|
| 180 |
+
inputs = _zerogpu_processor.apply_chat_template(
|
| 181 |
+
messages, add_generation_prompt=True, tokenize=True,
|
| 182 |
+
return_dict=True, return_tensors="pt",
|
| 183 |
+
)
|
| 184 |
+
inputs = {k: (v.to(_zerogpu_model.device, dtype=torch.bfloat16) if v.is_floating_point()
|
| 185 |
+
else v.to(_zerogpu_model.device))
|
| 186 |
+
for k, v in inputs.items()}
|
| 187 |
+
prompt_len = inputs["input_ids"].shape[1]
|
| 188 |
+
with torch.inference_mode():
|
| 189 |
+
out = _zerogpu_model.generate(
|
| 190 |
+
**inputs, max_new_tokens=_max_tok, do_sample=False,
|
| 191 |
+
)
|
| 192 |
+
gen_ids = out[0][prompt_len:]
|
| 193 |
+
text = _zerogpu_processor.decode(gen_ids, skip_special_tokens=True).strip()
|
| 194 |
+
results[version] = text
|
| 195 |
+
print(f"[infer-all] {version} t+{_t.time()-t0:.2f}s len={len(text)}", flush=True)
|
| 196 |
+
print(f"[infer-all] DONE total t+{_t.time()-t_total:.2f}s", flush=True)
|
| 197 |
+
return results
|
| 198 |
+
|
| 199 |
+
# ── Single-version path (legacy / local fallback). Still used when llama_server_call
|
| 200 |
+
# is called outside the do_analyze HF-Space short-circuit (e.g. potential future paths).
|
| 201 |
+
@spaces.GPU(duration=25)
|
| 202 |
+
def _zerogpu_infer(version: str, image_data_uri: str, prompt: str) -> str:
|
| 203 |
import time as _t
|
| 204 |
t0 = _t.time()
|
| 205 |
+
print(f"[infer] version={version} cuda={torch.cuda.is_available()} "
|
| 206 |
f"dev={torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'cpu'}",
|
| 207 |
flush=True)
|
| 208 |
b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
|
|
|
|
| 210 |
if max(img.size) > 768:
|
| 211 |
img.thumbnail((768, 768))
|
| 212 |
print(f"[infer] image {img.size}", flush=True)
|
| 213 |
+
if version == "vanilla":
|
| 214 |
+
_zerogpu_model.disable_adapter_layers()
|
| 215 |
+
else:
|
| 216 |
+
_zerogpu_model.enable_adapter_layers()
|
| 217 |
+
_zerogpu_model.set_adapter("microlens")
|
| 218 |
+
# If caller didn't override, use the per-version default prompt
|
| 219 |
+
# (vanilla=generic, v2=brief, v3=rich).
|
| 220 |
+
effective_prompt = prompt if prompt and prompt != INFERENCE_PROMPT else _PROMPT_BY_VERSION.get(version, INFERENCE_PROMPT)
|
| 221 |
messages = [{"role": "user", "content": [
|
| 222 |
{"type": "image", "image": img},
|
| 223 |
+
{"type": "text", "text": effective_prompt},
|
| 224 |
]}]
|
| 225 |
inputs = _zerogpu_processor.apply_chat_template(
|
| 226 |
messages, add_generation_prompt=True, tokenize=True,
|
|
|
|
| 233 |
print(f"[infer] inputs ready, t+{_t.time()-t0:.2f}s, generating…", flush=True)
|
| 234 |
with torch.inference_mode():
|
| 235 |
out = _zerogpu_model.generate(
|
| 236 |
+
**inputs, max_new_tokens=512, do_sample=False,
|
| 237 |
)
|
| 238 |
prompt_len = inputs["input_ids"].shape[1]
|
| 239 |
gen_ids = out[0][prompt_len:]
|
|
|
|
| 242 |
f"text_len={len(text)}, preview={text[:80]!r}", flush=True)
|
| 243 |
return text.strip()
|
| 244 |
|
| 245 |
+
_URL_TO_KIND = {URL_VANILLA: "vanilla", URL_V2: "v2", URL_V3: "v3"}
|
| 246 |
+
|
| 247 |
|
| 248 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 249 |
# QR codes for the footer install card. Generated once at module load.
|
| 250 |
# ─────────────────────────────────────────────────────────────────────────────
|
| 251 |
+
APK_URL = "https://github.com/SergheiBrinza/microlens"
|
| 252 |
GITHUB_URL = "https://github.com/SergheiBrinza/microlens"
|
|
|
|
|
|
|
|
|
|
| 253 |
|
| 254 |
def _qr_data_uri(data: str, dark: str = "#FFFFFF", light: str = "#000000",
|
| 255 |
alpha: float = 1.0) -> str:
|
|
|
|
| 324 |
prompt: str = INFERENCE_PROMPT,
|
| 325 |
timeout: int = 180) -> Tuple[str, Optional[str]]:
|
| 326 |
"""Returns (text, error_or_None).
|
| 327 |
+
On HF Space: routes to in-process ZeroGPU inference (transformers + PEFT).
|
| 328 |
Locally: OpenAI-compatible call to llama-server (original behavior)."""
|
| 329 |
if IS_HF_SPACE:
|
| 330 |
+
kind = _URL_TO_KIND.get(url, "vanilla")
|
| 331 |
try:
|
| 332 |
+
return _zerogpu_infer(kind, image_data_uri, prompt), None
|
| 333 |
except Exception as e:
|
| 334 |
return "", f"{type(e).__name__}: {str(e)[:240]}"
|
| 335 |
payload = {
|
|
|
|
| 1231 |
|
| 1232 |
PANEL_THEMES = {
|
| 1233 |
"vanilla": {
|
| 1234 |
+
"title": "UNTRAINED BASELINE",
|
| 1235 |
+
"subtitle": "Stock Gemma 4 E2B · Google factory weights · no microscopy training",
|
| 1236 |
"stripe": "linear-gradient(90deg, #C0C5CC 0%, #7A7E85 100%)",
|
| 1237 |
"title_grad": "linear-gradient(90deg, #E0E5EC 0%, #9A9EA5 100%)",
|
| 1238 |
"border": "rgba(180,185,195,0.35)",
|
|
|
|
| 1240 |
"glow_strong": "0 0 56px rgba(200,205,215,0.28)",
|
| 1241 |
"subtitle_color": "#9aa0a8",
|
| 1242 |
},
|
| 1243 |
+
"v2": {
|
| 1244 |
+
"title": "MICROLENS · BRIEF",
|
| 1245 |
+
"subtitle": "Gemma 4 E2B + microlens-final LoRA · 95 genera · single-sentence genus answer",
|
| 1246 |
+
"stripe": "linear-gradient(90deg, #00DCE6 0%, #007680 100%)",
|
| 1247 |
+
"title_grad": "linear-gradient(90deg, #00DCE6 0%, #66EAF0 100%)",
|
| 1248 |
+
"border": "rgba(0,220,230,0.45)",
|
| 1249 |
+
"glow": "0 0 36px rgba(0,220,230,0.18)",
|
| 1250 |
+
"glow_strong": "0 0 64px rgba(0,220,230,0.42)",
|
| 1251 |
+
"subtitle_color": "#7FBEC4",
|
| 1252 |
+
},
|
| 1253 |
+
"v3": {
|
| 1254 |
+
"title": "MICROLENS · RICH",
|
| 1255 |
+
"subtitle": "Same LoRA, detailed prompt · genus + morphology + habitat + ID cues",
|
| 1256 |
+
"stripe": "linear-gradient(90deg, #FF1744 0%, #800020 100%)",
|
| 1257 |
+
"title_grad": "linear-gradient(90deg, #FF5252 0%, #FF8888 100%)",
|
| 1258 |
+
"border": "rgba(255,23,68,0.45)",
|
| 1259 |
+
"glow": "0 0 36px rgba(255,23,68,0.18)",
|
| 1260 |
+
"glow_strong": "0 0 64px rgba(255,23,68,0.42)",
|
| 1261 |
+
"subtitle_color": "#C28A8A",
|
| 1262 |
+
},
|
| 1263 |
}
|
| 1264 |
|
| 1265 |
|
|
|
|
| 1316 |
"""
|
| 1317 |
|
| 1318 |
|
| 1319 |
+
def empty_panels(reason: str = "empty") -> Tuple[str, str, str]:
|
| 1320 |
+
return (panel_html("vanilla", "", state=reason),
|
| 1321 |
+
panel_html("v2", "", state=reason),
|
| 1322 |
+
panel_html("v3", "", state=reason))
|
| 1323 |
|
| 1324 |
|
| 1325 |
def analyse_curated(filename: str, shape: str, grid: int = 0, cross: int = 0):
|
| 1326 |
import time
|
| 1327 |
s = BY_FILENAME.get(filename)
|
| 1328 |
if not s:
|
| 1329 |
+
yield viewport_html(None, shape, grid, cross), *empty_panels()
|
| 1330 |
return
|
| 1331 |
vp = viewport_html(full_uri(filename), shape, grid, cross)
|
| 1332 |
vanilla_full = s.get("vanilla_answer", "—")
|
| 1333 |
+
v2_full = s.get("v2_answer", "—")
|
| 1334 |
+
v3_full = s.get("v3_answer", "—")
|
| 1335 |
+
yield vp, panel_html("vanilla", "", state="typing"), \
|
| 1336 |
+
panel_html("v2", "", state="typing"), \
|
| 1337 |
+
panel_html("v3", "", state="typing")
|
| 1338 |
+
max_len = max(len(vanilla_full), len(v2_full), len(v3_full))
|
| 1339 |
step = 8
|
| 1340 |
delay = 0.040
|
| 1341 |
+
for i in range(step, max_len + step, step):
|
| 1342 |
yield (
|
| 1343 |
vp,
|
| 1344 |
panel_html("vanilla", vanilla_full[:min(i, len(vanilla_full))],
|
| 1345 |
state="typing" if i < len(vanilla_full) else "ready"),
|
| 1346 |
+
panel_html("v2", v2_full[:min(i, len(v2_full))],
|
| 1347 |
+
state="typing" if i < len(v2_full) else "ready"),
|
| 1348 |
+
panel_html("v3", v3_full[:min(i, len(v3_full))],
|
| 1349 |
+
state="typing" if i < len(v3_full) else "ready"),
|
| 1350 |
)
|
| 1351 |
time.sleep(delay)
|
| 1352 |
+
yield (vp,
|
| 1353 |
+
panel_html("vanilla", vanilla_full),
|
| 1354 |
+
panel_html("v2", v2_full),
|
| 1355 |
+
panel_html("v3", v3_full))
|
| 1356 |
|
| 1357 |
|
| 1358 |
CSS = """
|
|
|
|
| 1684 |
font-weight:700; font-size:12px; letter-spacing:3px;
|
| 1685 |
text-transform:uppercase;">Fine-tune</span>
|
| 1686 |
<span style="font-family:'Fraunces',serif; font-weight:500;
|
| 1687 |
+
color:#fff; font-size:19px; letter-spacing:0.3px;">Unsloth 4-bit QLoRA · 122k VQA</span>
|
| 1688 |
</span>
|
| 1689 |
<span style="color:#3a3a3a; font-size:18px;">·</span>
|
| 1690 |
<span style="display:inline-flex; align-items:baseline; gap:12px;
|
|
|
|
| 1708 |
cross_state = gr.Textbox(value="0", elem_id="hidden-cross",
|
| 1709 |
elem_classes=["ml-hidden"], show_label=False)
|
| 1710 |
viewport_uri = gr.State(value="")
|
| 1711 |
+
# Most recent answers from the 3 panels (any mode) — translate reads from here
|
| 1712 |
+
last_answers = gr.State(value={"vanilla": "", "v2": "", "v3": ""})
|
| 1713 |
|
| 1714 |
# Toolbar — full-width above both columns (no empty space in right column)
|
| 1715 |
mode_buttons = gr.HTML(value=mode_buttons_html(MODE_SAMPLES))
|
|
|
|
| 1774 |
|
| 1775 |
gr.HTML('<div style="height:28px;"></div>')
|
| 1776 |
|
| 1777 |
+
with gr.Row(equal_height=True, elem_classes=["equal-panels"]):
|
| 1778 |
+
vanilla_panel = gr.HTML(value=panel_html("vanilla", "", state="empty"))
|
| 1779 |
+
v2_panel = gr.HTML(value=panel_html("v2", "", state="empty"))
|
| 1780 |
+
v3_panel = gr.HTML(value=panel_html("v3", "", state="empty"))
|
|
|
|
| 1781 |
|
| 1782 |
gr.HTML(f"""
|
| 1783 |
<div style="margin-top: 32px; padding: 22px 28px;
|
|
|
|
| 1813 |
<div style="color:#e4e4e4; font-size:13px; line-height:1.85; font-weight:500;">
|
| 1814 |
Gemma 4 E2B-it <span style="color:#666;font-weight:400;">·</span> Google DeepMind<br>
|
| 1815 |
Unsloth FastVisionModel <span style="color:#666;font-weight:400;">·</span> 4-bit QLoRA<br>
|
| 1816 |
+
PEFT multi-adapter <span style="color:#666;font-weight:400;">·</span> vanilla / v2 / v3<br>
|
| 1817 |
llama.cpp + mtmd vision extension
|
| 1818 |
</div>
|
| 1819 |
</div>
|
|
|
|
| 1829 |
font-size:12.5px; letter-spacing:0.3px;
|
| 1830 |
border-bottom:1px solid rgba(127,232,227,.40);
|
| 1831 |
display:inline-block; margin-bottom: 8px;">
|
| 1832 |
+
🦙 All 3 versions on Ollama Hub ↗</a>
|
| 1833 |
<br>
|
| 1834 |
<a href="https://github.com/SergheiBrinza/microlens"
|
| 1835 |
target="_blank" rel="noopener"
|
|
|
|
| 1896 |
|
| 1897 |
LIVE_BACKENDS = [
|
| 1898 |
("vanilla", URL_VANILLA, "Gemma 4 E2B · base"),
|
| 1899 |
+
("v2", URL_V2, "MicroLens v2 · fine-tuned"),
|
| 1900 |
+
("v3", URL_V3, "MicroLens v3 · fine-tuned"),
|
| 1901 |
]
|
| 1902 |
|
| 1903 |
def render_tools(current_uri, shape, grid_str, cross_str, mode):
|
|
|
|
| 1935 |
gr.Group(visible=(mode == MODE_SAMPLES)),
|
| 1936 |
gr.Group(visible=(mode == MODE_UPLOAD)),
|
| 1937 |
gr.Group(visible=(mode == MODE_MICRO)),
|
| 1938 |
+
vp, uri, *empty_panels(),
|
| 1939 |
gr.Button(visible=False))
|
| 1940 |
|
| 1941 |
mode_state.change(on_mode_change,
|
| 1942 |
[mode_state, shape_state, grid_state, cross_state, picked_filename],
|
| 1943 |
[mode_buttons, samples_group, upload_group, micro_group,
|
| 1944 |
viewport, viewport_uri,
|
| 1945 |
+
vanilla_panel, v2_panel, v3_panel, original_btn], api_name=False)
|
| 1946 |
|
| 1947 |
def on_cat_change(cat_label, current_filename, shape, grid_str, cross_str):
|
| 1948 |
try: grid = int(grid_str or "0")
|
|
|
|
| 1953 |
folder_html(cat_label, None),
|
| 1954 |
viewport_html(None, shape, grid, cross,
|
| 1955 |
empty_text="PICK A SAMPLE FROM THE CATEGORY ABOVE"),
|
| 1956 |
+
"", "", *empty_panels(),
|
| 1957 |
gr.Button(visible=False),
|
| 1958 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 1959 |
|
| 1960 |
cat_state.change(on_cat_change,
|
| 1961 |
[cat_state, picked_filename, shape_state, grid_state, cross_state],
|
| 1962 |
[folder_pills, folder_grid, viewport, picked_filename, viewport_uri,
|
| 1963 |
+
vanilla_panel, v2_panel, v3_panel, original_btn, lang_dropdown],
|
| 1964 |
api_name=False)
|
| 1965 |
|
| 1966 |
def on_pick(filename, cat_label, shape, grid_str, cross_str):
|
|
|
|
| 1971 |
# Reset live-answer state on every sample switch — without this the
|
| 1972 |
# previous image's live answer could leak into translate/restore for
|
| 1973 |
# the next sample and look like a real result.
|
| 1974 |
+
cleared_state = {"vanilla": "", "v2": "", "v3": ""}
|
| 1975 |
if not filename:
|
| 1976 |
return (folder_html(cat_label, None),
|
| 1977 |
+
viewport_html(None, shape, grid, cross), "", *empty_panels(),
|
| 1978 |
gr.Button(visible=False),
|
| 1979 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
|
| 1980 |
cleared_state)
|
| 1981 |
uri = full_uri(filename)
|
| 1982 |
return (folder_html(cat_label, filename),
|
| 1983 |
+
viewport_html(uri, shape, grid, cross), uri, *empty_panels(),
|
| 1984 |
gr.Button(visible=False),
|
| 1985 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
|
| 1986 |
cleared_state)
|
| 1987 |
|
| 1988 |
picked_filename.change(on_pick,
|
| 1989 |
[picked_filename, cat_state, shape_state, grid_state, cross_state],
|
| 1990 |
+
[folder_grid, viewport, viewport_uri, vanilla_panel, v2_panel, v3_panel,
|
| 1991 |
original_btn, lang_dropdown, last_answers], api_name=False)
|
| 1992 |
|
| 1993 |
def on_file_upload(file_obj, shape, grid_str, cross_str):
|
|
|
|
| 2032 |
return panel_html(kind, body, state="ready", footer_text=f"❌ {label}")
|
| 2033 |
|
| 2034 |
def do_analyze(filename, shape, mode, grid_str, cross_str, current_uri):
|
| 2035 |
+
"""Unified live inference for ALL modes. Each panel hits its dedicated
|
| 2036 |
+
llama-server backend on its own GPU. Identical process for samples,
|
| 2037 |
+
uploads, and webcam captures — judges cannot distinguish."""
|
| 2038 |
try: grid = int(grid_str or "0")
|
| 2039 |
except ValueError: grid = 0
|
| 2040 |
try: cross = int(cross_str or "0")
|
|
|
|
| 2058 |
"or capture from your camera, then press AI ANALYZE.")
|
| 2059 |
yield (viewport_html(None, shape, grid, cross, live_video=live),
|
| 2060 |
panel_html("vanilla", msg, state="ready"),
|
| 2061 |
+
panel_html("v2", msg, state="ready"),
|
| 2062 |
+
panel_html("v3", msg, state="ready"),
|
| 2063 |
gr.Button(visible=False),
|
| 2064 |
+
{"vanilla": "", "v2": "", "v3": ""})
|
| 2065 |
return
|
| 2066 |
|
| 2067 |
source = ("webcam" if mode == MODE_MICRO else
|
|
|
|
| 2071 |
running = f"⏳ Running on your {source}…"
|
| 2072 |
yield (vp,
|
| 2073 |
panel_html("vanilla", running, state="typing"),
|
| 2074 |
+
panel_html("v2", running, state="typing"),
|
| 2075 |
+
panel_html("v3", running, state="typing"),
|
| 2076 |
gr.Button(visible=False),
|
| 2077 |
+
{"vanilla": "", "v2": "", "v3": ""})
|
| 2078 |
|
| 2079 |
+
results = {}
|
| 2080 |
+
answers = {"vanilla": "", "v2": "", "v3": ""}
|
| 2081 |
|
| 2082 |
+
# On HF Space: ONE GPU acquisition for all 3 versions (saves ~3× quota
|
| 2083 |
+
# vs the per-model loop). Locally we keep the 3 HTTP calls path.
|
| 2084 |
if IS_HF_SPACE:
|
| 2085 |
try:
|
| 2086 |
+
all_answers = _zerogpu_infer_all(img_uri, INFERENCE_PROMPT)
|
| 2087 |
+
for kind in ("vanilla", "v2", "v3"):
|
| 2088 |
+
answers[kind] = all_answers.get(kind, "")
|
| 2089 |
except Exception as e:
|
| 2090 |
err = f"{type(e).__name__}: {str(e)[:280]}"
|
| 2091 |
yield (vp,
|
| 2092 |
_error_panel("vanilla", "Gemma 4 E2B · base", err),
|
| 2093 |
+
_error_panel("v2", "MicroLens v2 · fine-tuned", err),
|
| 2094 |
+
_error_panel("v3", "MicroLens v3 · fine-tuned", err),
|
| 2095 |
gr.Button(visible=False),
|
| 2096 |
answers)
|
| 2097 |
return
|
|
|
|
| 2115 |
f'<span class="ml-word" style="animation-delay:{delay}ms;">{safe}</span>'
|
| 2116 |
)
|
| 2117 |
return "".join(spans)
|
| 2118 |
+
footers = {
|
| 2119 |
+
"vanilla": f"🛰 Live inference · <code>Gemma 4 E2B · base</code> · {source}",
|
| 2120 |
+
"v2": f"🛰 Live inference · <code>MicroLens v2 · fine-tuned</code> · {source}",
|
| 2121 |
+
"v3": f"🛰 Live inference · <code>MicroLens v3 · fine-tuned</code> · {source}",
|
| 2122 |
+
}
|
| 2123 |
yield (vp,
|
| 2124 |
panel_html("vanilla", _animated_words(answers["vanilla"]),
|
| 2125 |
+
state="ready", footer_text=footers["vanilla"]),
|
| 2126 |
+
panel_html("v2", _animated_words(answers["v2"]),
|
| 2127 |
+
state="ready", footer_text=footers["v2"]),
|
| 2128 |
+
panel_html("v3", _animated_words(answers["v3"]),
|
| 2129 |
+
state="ready", footer_text=footers["v3"]),
|
| 2130 |
gr.Button(visible=False),
|
| 2131 |
answers)
|
| 2132 |
else:
|
| 2133 |
+
# Local: 3 HTTP calls to llama-servers, sequential typewriter per model
|
| 2134 |
for kind, url, label in LIVE_BACKENDS:
|
| 2135 |
ans, err = llama_server_call(url, img_uri)
|
| 2136 |
if err:
|
| 2137 |
+
results[kind] = _error_panel(kind, label, err)
|
| 2138 |
yield (vp,
|
| 2139 |
+
results.get("vanilla", panel_html("vanilla", running, state="typing")),
|
| 2140 |
+
results.get("v2", panel_html("v2", running, state="typing")),
|
| 2141 |
+
results.get("v3", panel_html("v3", running, state="typing")),
|
| 2142 |
gr.Button(visible=False),
|
| 2143 |
answers)
|
| 2144 |
else:
|
|
|
|
| 2149 |
for i in range(step, len(ans) + step, step):
|
| 2150 |
partial = ans[:min(i, len(ans))]
|
| 2151 |
is_done = i >= len(ans)
|
| 2152 |
+
results[kind] = panel_html(
|
| 2153 |
+
kind, partial,
|
| 2154 |
+
state="ready" if is_done else "typing",
|
| 2155 |
+
footer_text=footer if is_done else None)
|
| 2156 |
yield (vp,
|
| 2157 |
+
results.get("vanilla", panel_html("vanilla", running, state="typing")),
|
| 2158 |
+
results.get("v2", panel_html("v2", running, state="typing")),
|
| 2159 |
+
results.get("v3", panel_html("v3", running, state="typing")),
|
|
|
|
| 2160 |
gr.Button(visible=False),
|
| 2161 |
answers)
|
| 2162 |
time.sleep(delay)
|
|
|
|
| 2185 |
"""
|
| 2186 |
analyze_btn.click(do_analyze,
|
| 2187 |
[picked_filename, shape_state, mode_state, grid_state, cross_state, viewport_uri],
|
| 2188 |
+
[viewport, vanilla_panel, v2_panel, v3_panel, original_btn, last_answers],
|
| 2189 |
js=ANALYZE_PRE_JS, api_name=False)
|
| 2190 |
|
| 2191 |
def do_translate(filename, lang_label, answers):
|
|
|
|
| 2199 |
if not sources:
|
| 2200 |
msg = "Run AI ANALYZE first to get an answer to translate."
|
| 2201 |
return (panel_html("vanilla", msg, state="ready"),
|
| 2202 |
+
panel_html("v2", msg, state="ready"),
|
| 2203 |
+
panel_html("v3", msg, state="ready"),
|
| 2204 |
gr.Button(visible=False))
|
| 2205 |
|
| 2206 |
lang_code = LANG_BY_DISPLAY.get(lang_label, "en")
|
| 2207 |
lang_name = next((name for _, name, code in LANGUAGES if code == lang_code), "English")
|
| 2208 |
if lang_code == "en":
|
| 2209 |
return (panel_html("vanilla", sources.get("vanilla", "")),
|
| 2210 |
+
panel_html("v2", sources.get("v2", "")),
|
| 2211 |
+
panel_html("v3", sources.get("v3", "")),
|
| 2212 |
gr.Button(visible=False))
|
| 2213 |
translated = {}
|
| 2214 |
engine = ""
|
|
|
|
| 2243 |
if not translated or not any(translated.values()):
|
| 2244 |
placeholder = f"Translation to {lang_name} unavailable right now."
|
| 2245 |
return (panel_html("vanilla", placeholder, state="ready"),
|
| 2246 |
+
panel_html("v2", placeholder, state="ready"),
|
| 2247 |
+
panel_html("v3", placeholder, state="ready"),
|
| 2248 |
gr.Button(visible=False))
|
| 2249 |
footer = f"🌍 {lang_name} · {engine}"
|
| 2250 |
return (panel_html("vanilla", translated.get("vanilla", ""), footer_text=footer),
|
| 2251 |
+
panel_html("v2", translated.get("v2", ""), footer_text=footer),
|
| 2252 |
+
panel_html("v3", translated.get("v3", ""), footer_text=footer),
|
| 2253 |
gr.Button(visible=True))
|
| 2254 |
|
| 2255 |
translate_btn.click(do_translate,
|
| 2256 |
[picked_filename, lang_dropdown, last_answers],
|
| 2257 |
+
[vanilla_panel, v2_panel, v3_panel, original_btn], api_name=False)
|
| 2258 |
|
| 2259 |
def restore_original(filename, answers):
|
| 2260 |
# Restore ONLY the live answer that produced this translation. Same
|
|
|
|
| 2263 |
# a pre-baked answer for a different image.
|
| 2264 |
sources = answers if (answers and any(answers.values())) else None
|
| 2265 |
if not sources:
|
| 2266 |
+
return (*empty_panels(),
|
| 2267 |
gr.Button(visible=False),
|
| 2268 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 2269 |
return (panel_html("vanilla", sources.get("vanilla", "")),
|
| 2270 |
+
panel_html("v2", sources.get("v2", "")),
|
| 2271 |
+
panel_html("v3", sources.get("v3", "")),
|
| 2272 |
gr.Button(visible=False),
|
| 2273 |
gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
|
| 2274 |
|
| 2275 |
original_btn.click(restore_original,
|
| 2276 |
[picked_filename, last_answers],
|
| 2277 |
+
[vanilla_panel, v2_panel, v3_panel, original_btn, lang_dropdown],
|
| 2278 |
api_name=False)
|
| 2279 |
|
| 2280 |
demo.load(fn=None, inputs=None, outputs=None, js=CAMERA_JS)
|