Spaces:

Laborator
/

microlens

Running on Zero

Laborator commited on 5 days ago

Commit

d0abc67

1 Parent(s): 40d108c

Restore 3-panel layout: vanilla / brief / rich with microlens-final LoRA

Reattach the fine-tuned LoRA (now Laborator/microlens-final, single adapter
instead of v2/v3 subfolders) and rebuild the 3-result-panel UI:

- UNTRAINED BASELINE: stock Gemma 4 E2B with adapter disabled
- MICROLENS BRIEF: base + LoRA + BRIEF_PROMPT (one-sentence answer)
- MICROLENS RICH: base + LoRA + RICH_PROMPT (full schema)

Brief and rich share the same microlens-final adapter and differ only in the
prompt and the max_new_tokens budget (96 vs 512).

- _HF_LORA_REPO → Laborator/microlens-final
- _zerogpu_infer_all: per-version prompt, single adapter swap
- PANEL_THEMES["v2"/"v3"]: rename titles to MICROLENS BRIEF/RICH, update
subtitle to reflect 95 genera / single-LoRA architecture
- APK QR fallback → GitHub repo
- README: drop base-only Status, add models: Laborator/microlens-final,
switch genus claim to 95 (diatoms + fungal spores)

Files changed (2) hide show

README.md +10 -9
app.py +226 -68

README.md CHANGED Viewed

@@ -8,7 +8,9 @@ sdk_version: 5.7.1
 app_file: app.py
 pinned: false
 license: apache-2.0
-short_description: Microscopy AI · base Gemma 4 E2B · LoRA in training
 tags:
   - microscopy
   - biology
@@ -21,25 +23,23 @@ tags:
 # 🔬 MicroLens — Microscopy AI Demo
-Live demo of **MicroLens**: a microscopy-focused vision-language interface built on **Gemma 4 E2B**.
 Built by **Serghei Brinza** (Vienna, Austria) for the **Kaggle Gemma 4 Good Hackathon** (May 2026).
 ---
-## Status
-⚠️ Currently running on stock **Gemma 4 E2B** only — the fine-tuned LoRA is in training (expected restoration after 2026-05-17). Accuracy on microscopy genera will be limited until the LoRA is loaded.
----
 ## 🚀 How to use
 1. Upload a microscopy image (or pick one from the examples gallery).
 2. Optionally add a question, for example: "What is shown here?", "Describe the morphology", "What category does this belong to?"
 3. Click **AI Analyze**.
-The model returns a natural-language description of the image. While the fine-tune is in training the base model gives a generic vision answer rather than genus-level identification.
 ---
@@ -64,6 +64,7 @@ For any vertical that requires regulatory clearance (medicine, veterinary, foren
 ## 🔗 Links
 - **Source code:** [SergheiBrinza/microlens](https://github.com/SergheiBrinza/microlens)
 - **Hackathon writeup:** [KAGGLE_WRITEUP.md](https://github.com/SergheiBrinza/microlens/blob/main/KAGGLE_WRITEUP.md)
 - **Roadmap:** [ROADMAP.md](https://github.com/SergheiBrinza/microlens/blob/main/ROADMAP.md)

 app_file: app.py
 pinned: false
 license: apache-2.0
+short_description: Gemma 4 E2B + LoRA · diatoms & fungal spores · 95 genera
+models:
+  - Laborator/microlens-final
 tags:
   - microscopy
   - biology
 # 🔬 MicroLens — Microscopy AI Demo
+Live demo of **MicroLens**: a fine-tuned **Gemma 4 E2B** vision-language model that identifies microscopic subjects across **2 categories** (diatoms and fungal spores) and **95 genera**.
 Built by **Serghei Brinza** (Vienna, Austria) for the **Kaggle Gemma 4 Good Hackathon** (May 2026).
 ---
 ## 🚀 How to use
 1. Upload a microscopy image (or pick one from the examples gallery).
 2. Optionally add a question, for example: "What is shown here?", "Describe the morphology", "What category does this belong to?"
 3. Click **AI Analyze**.
+You get three side-by-side answers:
+- **UNTRAINED BASELINE** — stock Gemma 4 E2B with no microscopy training.
+- **MICROLENS · BRIEF** — same base + `Laborator/microlens-final` LoRA, prompted for a single-sentence genus answer.
+- **MICROLENS · RICH** — same base + same LoRA, prompted for the full schema (genus + morphology + habitat + identification cues).
 ---
 ## 🔗 Links
+- **Model:** [Laborator/microlens-final](https://huggingface.co/Laborator/microlens-final)
 - **Source code:** [SergheiBrinza/microlens](https://github.com/SergheiBrinza/microlens)
 - **Hackathon writeup:** [KAGGLE_WRITEUP.md](https://github.com/SergheiBrinza/microlens/blob/main/KAGGLE_WRITEUP.md)
 - **Roadmap:** [ROADMAP.md](https://github.com/SergheiBrinza/microlens/blob/main/ROADMAP.md)

app.py CHANGED Viewed

@@ -8,14 +8,17 @@ Layout:
   control panel (mode-dependent: 5 categories × 6 thumbs / upload zone /
   camera enumeration)
 - AI ANALYZE long oval cyan→red gradient button
 - Translate row with 28 languages (English default) + ORIGINAL button after
   translation
 - Footer with run-locally + APK + Legal links
-SAMPLES tab uses cached vanilla answers from catalog.json.
-UPLOAD / MICROSCOPE tabs run LIVE inference against the vanilla backend URL:
   URL_VANILLA  (default http://127.0.0.1:8085/v1/chat/completions)
-On HF Space deployment configure this as a Variable to point at a public tunnel
 (e.g. Cloudflare → llama-server). When unreachable the panel shows a clean
 "backend unavailable" message instead of crashing.
 """
@@ -86,17 +89,22 @@ CATALOG: List[Dict] = json.loads(CATALOG_PATH.read_text())
 BY_FILENAME = {s["filename"]: s for s in CATALOG}
 URL_VANILLA = os.environ.get("URL_VANILLA", "http://127.0.0.1:8085/v1/chat/completions")
 INFERENCE_PROMPT = "What is shown in this microscope image?"
 # ─────────────────────────────────────────────────────────────────────────────
 # ZeroGPU runtime: when running on HF Space we replace HTTP llama-server calls
-# with in-process transformers inference on H200 against the stock Gemma 4 E2B
-# base model. The fine-tuned LoRA adapters are currently in training and not
-# attached. Outside HF Space (local dev) the original HTTP path is preserved.
 # ─────────────────────────────────────────────────────────────────────────────
 IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
 _HF_BASE = "unsloth/gemma-4-E2B-it"
 _zerogpu_processor = None
 _zerogpu_model = None
@@ -105,23 +113,96 @@ if IS_HF_SPACE:
     import spaces
     import torch
     from transformers import AutoProcessor, AutoModelForImageTextToText
     print("[ZeroGPU] loading processor + base model on cuda…", flush=True)
     _zerogpu_processor = AutoProcessor.from_pretrained(_HF_BASE)
     _zerogpu_model = AutoModelForImageTextToText.from_pretrained(
         _HF_BASE, torch_dtype=torch.bfloat16, device_map="cuda",
     )
-    _zerogpu_model.eval()
-    print("[ZeroGPU] ready (base Gemma 4 E2B only — LoRA in training)", flush=True)
-    # ── Single inference path: stock base model, one GPU acquisition per click.
-    # duration=30: vanilla can ramble up to 20+s on long answers; 30s budget
-    # leaves headroom without overspending ZeroGPU quota.
-    @spaces.GPU(duration=30)
-    def _zerogpu_infer(image_data_uri: str, prompt: str) -> str:
         import time as _t
         t0 = _t.time()
-        print(f"[infer] cuda={torch.cuda.is_available()} "
               f"dev={torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'cpu'}",
               flush=True)
         b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
@@ -129,9 +210,17 @@ if IS_HF_SPACE:
         if max(img.size) > 768:
             img.thumbnail((768, 768))
         print(f"[infer] image {img.size}", flush=True)
         messages = [{"role": "user", "content": [
             {"type": "image", "image": img},
-            {"type": "text", "text": prompt},
         ]}]
         inputs = _zerogpu_processor.apply_chat_template(
             messages, add_generation_prompt=True, tokenize=True,
@@ -144,7 +233,7 @@ if IS_HF_SPACE:
         print(f"[infer] inputs ready, t+{_t.time()-t0:.2f}s, generating…", flush=True)
         with torch.inference_mode():
             out = _zerogpu_model.generate(
-                **inputs, max_new_tokens=256, do_sample=False,
             )
         prompt_len = inputs["input_ids"].shape[1]
         gen_ids = out[0][prompt_len:]
@@ -153,14 +242,14 @@ if IS_HF_SPACE:
               f"text_len={len(text)}, preview={text[:80]!r}", flush=True)
         return text.strip()
 # ─────────────────────────────────────────────────────────────────────────────
 # QR codes for the footer install card. Generated once at module load.
 # ─────────────────────────────────────────────────────────────────────────────
 GITHUB_URL = "https://github.com/SergheiBrinza/microlens"
-# APK fallback now points at the GitHub repo (the previous resolve/main URL
-# pointed at the deleted Laborator/microlens-gemma4-e2b LoRA repo).
-APK_URL = GITHUB_URL
 def _qr_data_uri(data: str, dark: str = "#FFFFFF", light: str = "#000000",
                   alpha: float = 1.0) -> str:
@@ -235,11 +324,12 @@ def llama_server_call(url: str, image_data_uri: str,
                        prompt: str = INFERENCE_PROMPT,
                        timeout: int = 180) -> Tuple[str, Optional[str]]:
     """Returns (text, error_or_None).
-    On HF Space: routes to in-process ZeroGPU inference (transformers).
     Locally: OpenAI-compatible call to llama-server (original behavior)."""
     if IS_HF_SPACE:
         try:
-            return _zerogpu_infer(image_data_uri, prompt), None
         except Exception as e:
             return "", f"{type(e).__name__}: {str(e)[:240]}"
     payload = {
@@ -1141,8 +1231,8 @@ async () => {
 PANEL_THEMES = {
     "vanilla": {
-        "title": "GEMMA 4 E2B · BASELINE",
-        "subtitle": "Stock Gemma 4 E2B · Google factory weights · LoRA fine-tune in training",
         "stripe": "linear-gradient(90deg, #C0C5CC 0%, #7A7E85 100%)",
         "title_grad": "linear-gradient(90deg, #E0E5EC 0%, #9A9EA5 100%)",
         "border": "rgba(180,185,195,0.35)",
@@ -1150,6 +1240,26 @@ PANEL_THEMES = {
         "glow_strong": "0 0 56px rgba(200,205,215,0.28)",
         "subtitle_color": "#9aa0a8",
     },
 }
@@ -1206,29 +1316,43 @@ def panel_html(kind: str, body: str, state: str = "ready", footer_text: Optional
     """
-def empty_panels(reason: str = "empty") -> str:
-    return panel_html("vanilla", "", state=reason)
 def analyse_curated(filename: str, shape: str, grid: int = 0, cross: int = 0):
     import time
     s = BY_FILENAME.get(filename)
     if not s:
-        yield viewport_html(None, shape, grid, cross), empty_panels()
         return
     vp = viewport_html(full_uri(filename), shape, grid, cross)
     vanilla_full = s.get("vanilla_answer", "—")
-    yield vp, panel_html("vanilla", "", state="typing")
     step = 8
     delay = 0.040
-    for i in range(step, len(vanilla_full) + step, step):
         yield (
             vp,
             panel_html("vanilla", vanilla_full[:min(i, len(vanilla_full))],
                        state="typing" if i < len(vanilla_full) else "ready"),
         )
         time.sleep(delay)
-    yield vp, panel_html("vanilla", vanilla_full)
 CSS = """
@@ -1560,7 +1684,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                              font-weight:700; font-size:12px; letter-spacing:3px;
                              text-transform:uppercase;">Fine-tune</span>
                 <span style="font-family:'Fraunces',serif; font-weight:500;
-                             color:#fff; font-size:19px; letter-spacing:0.3px;">Unsloth QLoRA &middot; in training</span>
             </span>
             <span style="color:#3a3a3a; font-size:18px;">&middot;</span>
             <span style="display:inline-flex; align-items:baseline; gap:12px;
@@ -1584,8 +1708,8 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
     cross_state = gr.Textbox(value="0", elem_id="hidden-cross",
                              elem_classes=["ml-hidden"], show_label=False)
     viewport_uri = gr.State(value="")
-    # Most recent answer from the panel (any mode) — translate reads from here
-    last_answers = gr.State(value={"vanilla": ""})
     # Toolbar — full-width above both columns (no empty space in right column)
     mode_buttons = gr.HTML(value=mode_buttons_html(MODE_SAMPLES))
@@ -1650,11 +1774,10 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
     gr.HTML('<div style="height:28px;"></div>')
-    with gr.Row(elem_classes=["equal-panels"]):
-        gr.HTML('<div></div>')
-        with gr.Column(scale=8, min_width=520):
-            vanilla_panel = gr.HTML(value=panel_html("vanilla", "", state="empty"))
-        gr.HTML('<div></div>')
     gr.HTML(f"""
     <div style="margin-top: 32px; padding: 22px 28px;
@@ -1690,7 +1813,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
           <div style="color:#e4e4e4; font-size:13px; line-height:1.85; font-weight:500;">
             Gemma 4 E2B-it <span style="color:#666;font-weight:400;">&middot;</span> Google DeepMind<br>
             Unsloth FastVisionModel <span style="color:#666;font-weight:400;">&middot;</span> 4-bit QLoRA<br>
-            Stock Gemma 4 E2B baseline <span style="color:#666;font-weight:400;">&middot;</span> LoRA fine-tune in training<br>
             llama.cpp + mtmd vision extension
           </div>
         </div>
@@ -1706,7 +1829,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                       font-size:12.5px; letter-spacing:0.3px;
                       border-bottom:1px solid rgba(127,232,227,.40);
                       display:inline-block; margin-bottom: 8px;">
-               &#x1F999; MicroLens models on Ollama Hub &nbsp;&#8599;</a>
             <br>
             <a href="https://github.com/SergheiBrinza/microlens"
                target="_blank" rel="noopener"
@@ -1773,6 +1896,8 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
     LIVE_BACKENDS = [
         ("vanilla", URL_VANILLA, "Gemma 4 E2B · base"),
     ]
     def render_tools(current_uri, shape, grid_str, cross_str, mode):
@@ -1810,14 +1935,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                 gr.Group(visible=(mode == MODE_SAMPLES)),
                 gr.Group(visible=(mode == MODE_UPLOAD)),
                 gr.Group(visible=(mode == MODE_MICRO)),
-                vp, uri, empty_panels(),
                 gr.Button(visible=False))
     mode_state.change(on_mode_change,
         [mode_state, shape_state, grid_state, cross_state, picked_filename],
         [mode_buttons, samples_group, upload_group, micro_group,
          viewport, viewport_uri,
-         vanilla_panel, original_btn], api_name=False)
     def on_cat_change(cat_label, current_filename, shape, grid_str, cross_str):
         try: grid = int(grid_str or "0")
@@ -1828,14 +1953,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                 folder_html(cat_label, None),
                 viewport_html(None, shape, grid, cross,
                               empty_text="PICK A SAMPLE FROM THE CATEGORY ABOVE"),
-                "", "", empty_panels(),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
     cat_state.change(on_cat_change,
         [cat_state, picked_filename, shape_state, grid_state, cross_state],
         [folder_pills, folder_grid, viewport, picked_filename, viewport_uri,
-         vanilla_panel, original_btn, lang_dropdown],
         api_name=False)
     def on_pick(filename, cat_label, shape, grid_str, cross_str):
@@ -1846,23 +1971,23 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         # Reset live-answer state on every sample switch — without this the
         # previous image's live answer could leak into translate/restore for
         # the next sample and look like a real result.
-        cleared_state = {"vanilla": ""}
         if not filename:
             return (folder_html(cat_label, None),
-                    viewport_html(None, shape, grid, cross), "", empty_panels(),
                     gr.Button(visible=False),
                     gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
                     cleared_state)
         uri = full_uri(filename)
         return (folder_html(cat_label, filename),
-                viewport_html(uri, shape, grid, cross), uri, empty_panels(),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
                 cleared_state)
     picked_filename.change(on_pick,
         [picked_filename, cat_state, shape_state, grid_state, cross_state],
-        [folder_grid, viewport, viewport_uri, vanilla_panel,
          original_btn, lang_dropdown, last_answers], api_name=False)
     def on_file_upload(file_obj, shape, grid_str, cross_str):
@@ -1907,9 +2032,9 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         return panel_html(kind, body, state="ready", footer_text=f"❌ {label}")
     def do_analyze(filename, shape, mode, grid_str, cross_str, current_uri):
-        """Unified live inference for ALL modes. Single panel hits the vanilla
-        Gemma 4 E2B backend on the GPU. Identical process for samples,
-        uploads, and webcam captures."""
         try: grid = int(grid_str or "0")
         except ValueError: grid = 0
         try: cross = int(cross_str or "0")
@@ -1933,8 +2058,10 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                   "or capture from your camera, then press AI ANALYZE.")
             yield (viewport_html(None, shape, grid, cross, live_video=live),
                    panel_html("vanilla", msg, state="ready"),
                    gr.Button(visible=False),
-                   {"vanilla": ""})
             return
         source = ("webcam" if mode == MODE_MICRO else
@@ -1944,20 +2071,27 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         running = f"⏳  Running on your {source}…"
         yield (vp,
                panel_html("vanilla", running, state="typing"),
                gr.Button(visible=False),
-               {"vanilla": ""})
-        answers = {"vanilla": ""}
-        # On HF Space: in-process ZeroGPU inference against the base model.
-        # Locally we keep the HTTP path (llama-server).
         if IS_HF_SPACE:
             try:
-                answers["vanilla"] = _zerogpu_infer(img_uri, INFERENCE_PROMPT)
             except Exception as e:
                 err = f"{type(e).__name__}: {str(e)[:280]}"
                 yield (vp,
                        _error_panel("vanilla", "Gemma 4 E2B · base", err),
                        gr.Button(visible=False),
                        answers)
                 return
@@ -1981,19 +2115,30 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                         f'<span class="ml-word" style="animation-delay:{delay}ms;">{safe}</span>'
                     )
                 return "".join(spans)
-            footer = f"🛰 Live inference · <code>Gemma 4 E2B · base</code> · {source}"
             yield (vp,
                    panel_html("vanilla", _animated_words(answers["vanilla"]),
-                              state="ready", footer_text=footer),
                    gr.Button(visible=False),
                    answers)
         else:
-            # Local: HTTP call to llama-server, sequential typewriter
             for kind, url, label in LIVE_BACKENDS:
                 ans, err = llama_server_call(url, img_uri)
                 if err:
                     yield (vp,
-                           _error_panel(kind, label, err),
                            gr.Button(visible=False),
                            answers)
                 else:
@@ -2004,11 +2149,14 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
                     for i in range(step, len(ans) + step, step):
                         partial = ans[:min(i, len(ans))]
                         is_done = i >= len(ans)
                         yield (vp,
-                               panel_html(
-                                   kind, partial,
-                                   state="ready" if is_done else "typing",
-                                   footer_text=footer if is_done else None),
                                gr.Button(visible=False),
                                answers)
                         time.sleep(delay)
@@ -2037,7 +2185,7 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
     """
     analyze_btn.click(do_analyze,
         [picked_filename, shape_state, mode_state, grid_state, cross_state, viewport_uri],
-        [viewport, vanilla_panel, original_btn, last_answers],
         js=ANALYZE_PRE_JS, api_name=False)
     def do_translate(filename, lang_label, answers):
@@ -2051,12 +2199,16 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         if not sources:
             msg = "Run AI ANALYZE first to get an answer to translate."
             return (panel_html("vanilla", msg, state="ready"),
                     gr.Button(visible=False))
         lang_code = LANG_BY_DISPLAY.get(lang_label, "en")
         lang_name = next((name for _, name, code in LANGUAGES if code == lang_code), "English")
         if lang_code == "en":
             return (panel_html("vanilla", sources.get("vanilla", "")),
                     gr.Button(visible=False))
         translated = {}
         engine = ""
@@ -2091,14 +2243,18 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         if not translated or not any(translated.values()):
             placeholder = f"Translation to {lang_name} unavailable right now."
             return (panel_html("vanilla", placeholder, state="ready"),
                     gr.Button(visible=False))
         footer = f"🌍 {lang_name} · {engine}"
         return (panel_html("vanilla", translated.get("vanilla", ""), footer_text=footer),
                 gr.Button(visible=True))
     translate_btn.click(do_translate,
         [picked_filename, lang_dropdown, last_answers],
-        [vanilla_panel, original_btn], api_name=False)
     def restore_original(filename, answers):
         # Restore ONLY the live answer that produced this translation. Same
@@ -2107,16 +2263,18 @@ with gr.Blocks(css=CSS, theme=gr.themes.Base(primary_hue="red", neutral_hue="zin
         # a pre-baked answer for a different image.
         sources = answers if (answers and any(answers.values())) else None
         if not sources:
-            return (empty_panels(),
                     gr.Button(visible=False),
                     gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
         return (panel_html("vanilla", sources.get("vanilla", "")),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
     original_btn.click(restore_original,
         [picked_filename, last_answers],
-        [vanilla_panel, original_btn, lang_dropdown],
         api_name=False)
     demo.load(fn=None, inputs=None, outputs=None, js=CAMERA_JS)

   control panel (mode-dependent: 5 categories × 6 thumbs / upload zone /
   camera enumeration)
 - AI ANALYZE long oval cyan→red gradient button
+- 3 result panels: UNTRAINED BASELINE / MICROLENS V2 BRIEF / MICROLENS V3 RICH
 - Translate row with 28 languages (English default) + ORIGINAL button after
   translation
 - Footer with run-locally + APK + Legal links
+SAMPLES tab uses cached answers from catalog.json (vanilla + v2 + v3 for all 30).
+UPLOAD / MICROSCOPE tabs run LIVE inference against per-model backend URLs:
   URL_VANILLA  (default http://127.0.0.1:8085/v1/chat/completions)
+  URL_V2       (default http://127.0.0.1:8084/v1/chat/completions)
+  URL_V3       (default http://127.0.0.1:8083/v1/chat/completions)
+On HF Space deployment configure these as Variables to point at a public tunnel
 (e.g. Cloudflare → llama-server). When unreachable the panel shows a clean
 "backend unavailable" message instead of crashing.
 """
 BY_FILENAME = {s["filename"]: s for s in CATALOG}
 URL_VANILLA = os.environ.get("URL_VANILLA", "http://127.0.0.1:8085/v1/chat/completions")
+URL_V2      = os.environ.get("URL_V2",      "http://127.0.0.1:8084/v1/chat/completions")
+URL_V3      = os.environ.get("URL_V3",      "http://127.0.0.1:8083/v1/chat/completions")
 INFERENCE_PROMPT = "What is shown in this microscope image?"
+BRIEF_PROMPT = "Identify the genus in this microscopy image. Reply with just the genus name and a one-sentence description."
+RICH_PROMPT = "Identify the organism in this microscopy image. Describe the genus, morphology, habitat, and identification cues."
+_PROMPT_BY_VERSION = {"vanilla": INFERENCE_PROMPT, "v2": BRIEF_PROMPT, "v3": RICH_PROMPT}
 # ─────────────────────────────────────────────────────────────────────────────
 # ZeroGPU runtime: when running on HF Space we replace HTTP llama-server calls
+# with in-process transformers + PEFT multi-adapter inference on H200.
+# Outside HF Space (local dev) the original HTTP path is preserved.
 # ─────────────────────────────────────────────────────────────────────────────
 IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
 _HF_BASE = "unsloth/gemma-4-E2B-it"
+_HF_LORA_REPO = "Laborator/microlens-final"
 _zerogpu_processor = None
 _zerogpu_model = None
     import spaces
     import torch
     from transformers import AutoProcessor, AutoModelForImageTextToText
+    from peft import PeftModel
     print("[ZeroGPU] loading processor + base model on cuda…", flush=True)
     _zerogpu_processor = AutoProcessor.from_pretrained(_HF_BASE)
     _zerogpu_model = AutoModelForImageTextToText.from_pretrained(
         _HF_BASE, torch_dtype=torch.bfloat16, device_map="cuda",
     )
+    # PEFT 0.19 cannot hook transformers' Gemma4ClippableLinear (vision tower
+    # wrapper around nn.Linear with opt-in clamping). The clamp thresholds
+    # default to ±inf so the wrapper is a behavioral no-op — replace each
+    # occurrence with its inner .linear so PEFT sees a plain nn.Linear.
+    def _unwrap_clippable(module):
+        from torch import nn
+        for name, child in list(module.named_children()):
+            if type(child).__name__ == "Gemma4ClippableLinear" and isinstance(
+                getattr(child, "linear", None), nn.Linear
+            ):
+                if getattr(child, "use_clipped_linears", False):
+                    print(f"[ZeroGPU] WARN: clipped-linears active on {name}; "
+                          "unwrapping anyway (thresholds are ±inf = no-op)", flush=True)
+                setattr(module, name, child.linear)
+            else:
+                _unwrap_clippable(child)
+    _unwrap_clippable(_zerogpu_model)
+    print("[ZeroGPU] attaching microlens-final LoRA…", flush=True)
+    _zerogpu_model = PeftModel.from_pretrained(
+        _zerogpu_model, _HF_LORA_REPO, adapter_name="microlens",
+    )
+    _zerogpu_model.eval()
+    print("[ZeroGPU] ready (vanilla = base off / brief + rich = same LoRA, different prompts)", flush=True)
+    # ── Batch path: run vanilla + brief + rich in a SINGLE GPU acquisition.
+    # vanilla is the base Gemma 4 with adapter disabled; brief and rich share
+    # the same microlens-final LoRA but use different prompts (BRIEF_PROMPT /
+    # RICH_PROMPT). duration=60s gives headroom for all three to finish.
+    @spaces.GPU(duration=60)
+    def _zerogpu_infer_all(image_data_uri: str, prompt: str = None):
+        import time as _t
+        t_total = _t.time()
+        print(f"[infer-all] start cuda={torch.cuda.is_available()}", flush=True)
+        b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
+        img = Image.open(BytesIO(base64.b64decode(b64))).convert("RGB")
+        if max(img.size) > 768:
+            img.thumbnail((768, 768))
+        results = {}
+        for version in ("vanilla", "v2", "v3"):
+            t0 = _t.time()
+            version_prompt = _PROMPT_BY_VERSION[version]
+            if version == "vanilla":
+                _zerogpu_model.disable_adapter_layers()
+                # Stock Gemma can ramble up to 1400+ chars on a microscope image
+                # which blows the 60s ZeroGPU budget; cap it tighter.
+                _max_tok = 256
+            else:
+                _zerogpu_model.enable_adapter_layers()
+                _zerogpu_model.set_adapter("microlens")
+                # brief stays short, rich gets headroom for full schema answer.
+                _max_tok = 96 if version == "v2" else 512
+            messages = [{"role": "user", "content": [
+                {"type": "image", "image": img},
+                {"type": "text", "text": version_prompt},
+            ]}]
+            inputs = _zerogpu_processor.apply_chat_template(
+                messages, add_generation_prompt=True, tokenize=True,
+                return_dict=True, return_tensors="pt",
+            )
+            inputs = {k: (v.to(_zerogpu_model.device, dtype=torch.bfloat16) if v.is_floating_point()
+                           else v.to(_zerogpu_model.device))
+                      for k, v in inputs.items()}
+            prompt_len = inputs["input_ids"].shape[1]
+            with torch.inference_mode():
+                out = _zerogpu_model.generate(
+                    **inputs, max_new_tokens=_max_tok, do_sample=False,
+                )
+            gen_ids = out[0][prompt_len:]
+            text = _zerogpu_processor.decode(gen_ids, skip_special_tokens=True).strip()
+            results[version] = text
+            print(f"[infer-all] {version} t+{_t.time()-t0:.2f}s len={len(text)}", flush=True)
+        print(f"[infer-all] DONE total t+{_t.time()-t_total:.2f}s", flush=True)
+        return results
+    # ── Single-version path (legacy / local fallback). Still used when llama_server_call
+    # is called outside the do_analyze HF-Space short-circuit (e.g. potential future paths).
+    @spaces.GPU(duration=25)
+    def _zerogpu_infer(version: str, image_data_uri: str, prompt: str) -> str:
         import time as _t
         t0 = _t.time()
+        print(f"[infer] version={version} cuda={torch.cuda.is_available()} "
               f"dev={torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'cpu'}",
               flush=True)
         b64 = _strip_data_uri(image_data_uri) if image_data_uri.startswith("data:") else image_data_uri
         if max(img.size) > 768:
             img.thumbnail((768, 768))
         print(f"[infer] image {img.size}", flush=True)
+        if version == "vanilla":
+            _zerogpu_model.disable_adapter_layers()
+        else:
+            _zerogpu_model.enable_adapter_layers()
+            _zerogpu_model.set_adapter("microlens")
+        # If caller didn't override, use the per-version default prompt
+        # (vanilla=generic, v2=brief, v3=rich).
+        effective_prompt = prompt if prompt and prompt != INFERENCE_PROMPT else _PROMPT_BY_VERSION.get(version, INFERENCE_PROMPT)
         messages = [{"role": "user", "content": [
             {"type": "image", "image": img},
+            {"type": "text", "text": effective_prompt},
         ]}]
         inputs = _zerogpu_processor.apply_chat_template(
             messages, add_generation_prompt=True, tokenize=True,
         print(f"[infer] inputs ready, t+{_t.time()-t0:.2f}s, generating…", flush=True)
         with torch.inference_mode():
             out = _zerogpu_model.generate(
+                **inputs, max_new_tokens=512, do_sample=False,
             )
         prompt_len = inputs["input_ids"].shape[1]
         gen_ids = out[0][prompt_len:]
               f"text_len={len(text)}, preview={text[:80]!r}", flush=True)
         return text.strip()
+_URL_TO_KIND = {URL_VANILLA: "vanilla", URL_V2: "v2", URL_V3: "v3"}
 # ─────────────────────────────────────────────────────────────────────────────
 # QR codes for the footer install card. Generated once at module load.
 # ─────────────────────────────────────────────────────────────────────────────
+APK_URL = "https://github.com/SergheiBrinza/microlens"
 GITHUB_URL = "https://github.com/SergheiBrinza/microlens"
 def _qr_data_uri(data: str, dark: str = "#FFFFFF", light: str = "#000000",
                   alpha: float = 1.0) -> str:
                        prompt: str = INFERENCE_PROMPT,
                        timeout: int = 180) -> Tuple[str, Optional[str]]:
     """Returns (text, error_or_None).
+    On HF Space: routes to in-process ZeroGPU inference (transformers + PEFT).
     Locally: OpenAI-compatible call to llama-server (original behavior)."""
     if IS_HF_SPACE:
+        kind = _URL_TO_KIND.get(url, "vanilla")
         try:
+            return _zerogpu_infer(kind, image_data_uri, prompt), None
         except Exception as e:
             return "", f"{type(e).__name__}: {str(e)[:240]}"
     payload = {
 PANEL_THEMES = {
     "vanilla": {
+        "title": "UNTRAINED BASELINE",
+        "subtitle": "Stock Gemma 4 E2B · Google factory weights · no microscopy training",
         "stripe": "linear-gradient(90deg, #C0C5CC 0%, #7A7E85 100%)",
         "title_grad": "linear-gradient(90deg, #E0E5EC 0%, #9A9EA5 100%)",
         "border": "rgba(180,185,195,0.35)",
         "glow_strong": "0 0 56px rgba(200,205,215,0.28)",
         "subtitle_color": "#9aa0a8",
     },
+    "v2": {
+        "title": "MICROLENS · BRIEF",
+        "subtitle": "Gemma 4 E2B + microlens-final LoRA · 95 genera · single-sentence genus answer",
+        "stripe": "linear-gradient(90deg, #00DCE6 0%, #007680 100%)",
+        "title_grad": "linear-gradient(90deg, #00DCE6 0%, #66EAF0 100%)",
+        "border": "rgba(0,220,230,0.45)",
+        "glow": "0 0 36px rgba(0,220,230,0.18)",
+        "glow_strong": "0 0 64px rgba(0,220,230,0.42)",
+        "subtitle_color": "#7FBEC4",
+    },
+    "v3": {
+        "title": "MICROLENS · RICH",
+        "subtitle": "Same LoRA, detailed prompt · genus + morphology + habitat + ID cues",
+        "stripe": "linear-gradient(90deg, #FF1744 0%, #800020 100%)",
+        "title_grad": "linear-gradient(90deg, #FF5252 0%, #FF8888 100%)",
+        "border": "rgba(255,23,68,0.45)",
+        "glow": "0 0 36px rgba(255,23,68,0.18)",
+        "glow_strong": "0 0 64px rgba(255,23,68,0.42)",
+        "subtitle_color": "#C28A8A",
+    },
 }
     """
+def empty_panels(reason: str = "empty") -> Tuple[str, str, str]:
+    return (panel_html("vanilla", "", state=reason),
+            panel_html("v2", "", state=reason),
+            panel_html("v3", "", state=reason))
 def analyse_curated(filename: str, shape: str, grid: int = 0, cross: int = 0):
     import time
     s = BY_FILENAME.get(filename)
     if not s:
+        yield viewport_html(None, shape, grid, cross), *empty_panels()
         return
     vp = viewport_html(full_uri(filename), shape, grid, cross)
     vanilla_full = s.get("vanilla_answer", "—")
+    v2_full      = s.get("v2_answer", "—")
+    v3_full      = s.get("v3_answer", "—")
+    yield vp, panel_html("vanilla", "", state="typing"), \
+              panel_html("v2", "", state="typing"), \
+              panel_html("v3", "", state="typing")
+    max_len = max(len(vanilla_full), len(v2_full), len(v3_full))
     step = 8
     delay = 0.040
+    for i in range(step, max_len + step, step):
         yield (
             vp,
             panel_html("vanilla", vanilla_full[:min(i, len(vanilla_full))],
                        state="typing" if i < len(vanilla_full) else "ready"),
+            panel_html("v2", v2_full[:min(i, len(v2_full))],
+                       state="typing" if i < len(v2_full) else "ready"),
+            panel_html("v3", v3_full[:min(i, len(v3_full))],
+                       state="typing" if i < len(v3_full) else "ready"),
         )
         time.sleep(delay)
+    yield (vp,
+           panel_html("vanilla", vanilla_full),
+           panel_html("v2", v2_full),
+           panel_html("v3", v3_full))
 CSS = """
                              font-weight:700; font-size:12px; letter-spacing:3px;
                              text-transform:uppercase;">Fine-tune</span>
                 <span style="font-family:'Fraunces',serif; font-weight:500;
+                             color:#fff; font-size:19px; letter-spacing:0.3px;">Unsloth 4-bit QLoRA &middot; 122k VQA</span>
             </span>
             <span style="color:#3a3a3a; font-size:18px;">&middot;</span>
             <span style="display:inline-flex; align-items:baseline; gap:12px;
     cross_state = gr.Textbox(value="0", elem_id="hidden-cross",
                              elem_classes=["ml-hidden"], show_label=False)
     viewport_uri = gr.State(value="")
+    # Most recent answers from the 3 panels (any mode) — translate reads from here
+    last_answers = gr.State(value={"vanilla": "", "v2": "", "v3": ""})
     # Toolbar — full-width above both columns (no empty space in right column)
     mode_buttons = gr.HTML(value=mode_buttons_html(MODE_SAMPLES))
     gr.HTML('<div style="height:28px;"></div>')
+    with gr.Row(equal_height=True, elem_classes=["equal-panels"]):
+        vanilla_panel = gr.HTML(value=panel_html("vanilla", "", state="empty"))
+        v2_panel = gr.HTML(value=panel_html("v2", "", state="empty"))
+        v3_panel = gr.HTML(value=panel_html("v3", "", state="empty"))
     gr.HTML(f"""
     <div style="margin-top: 32px; padding: 22px 28px;
           <div style="color:#e4e4e4; font-size:13px; line-height:1.85; font-weight:500;">
             Gemma 4 E2B-it <span style="color:#666;font-weight:400;">&middot;</span> Google DeepMind<br>
             Unsloth FastVisionModel <span style="color:#666;font-weight:400;">&middot;</span> 4-bit QLoRA<br>
+            PEFT multi-adapter <span style="color:#666;font-weight:400;">&middot;</span> vanilla / v2 / v3<br>
             llama.cpp + mtmd vision extension
           </div>
         </div>
                       font-size:12.5px; letter-spacing:0.3px;
                       border-bottom:1px solid rgba(127,232,227,.40);
                       display:inline-block; margin-bottom: 8px;">
+               &#x1F999; All 3 versions on Ollama Hub &nbsp;&#8599;</a>
             <br>
             <a href="https://github.com/SergheiBrinza/microlens"
                target="_blank" rel="noopener"
     LIVE_BACKENDS = [
         ("vanilla", URL_VANILLA, "Gemma 4 E2B · base"),
+        ("v2",      URL_V2,      "MicroLens v2 · fine-tuned"),
+        ("v3",      URL_V3,      "MicroLens v3 · fine-tuned"),
     ]
     def render_tools(current_uri, shape, grid_str, cross_str, mode):
                 gr.Group(visible=(mode == MODE_SAMPLES)),
                 gr.Group(visible=(mode == MODE_UPLOAD)),
                 gr.Group(visible=(mode == MODE_MICRO)),
+                vp, uri, *empty_panels(),
                 gr.Button(visible=False))
     mode_state.change(on_mode_change,
         [mode_state, shape_state, grid_state, cross_state, picked_filename],
         [mode_buttons, samples_group, upload_group, micro_group,
          viewport, viewport_uri,
+         vanilla_panel, v2_panel, v3_panel, original_btn], api_name=False)
     def on_cat_change(cat_label, current_filename, shape, grid_str, cross_str):
         try: grid = int(grid_str or "0")
                 folder_html(cat_label, None),
                 viewport_html(None, shape, grid, cross,
                               empty_text="PICK A SAMPLE FROM THE CATEGORY ABOVE"),
+                "", "", *empty_panels(),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
     cat_state.change(on_cat_change,
         [cat_state, picked_filename, shape_state, grid_state, cross_state],
         [folder_pills, folder_grid, viewport, picked_filename, viewport_uri,
+         vanilla_panel, v2_panel, v3_panel, original_btn, lang_dropdown],
         api_name=False)
     def on_pick(filename, cat_label, shape, grid_str, cross_str):
         # Reset live-answer state on every sample switch — without this the
         # previous image's live answer could leak into translate/restore for
         # the next sample and look like a real result.
+        cleared_state = {"vanilla": "", "v2": "", "v3": ""}
         if not filename:
             return (folder_html(cat_label, None),
+                    viewport_html(None, shape, grid, cross), "", *empty_panels(),
                     gr.Button(visible=False),
                     gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
                     cleared_state)
         uri = full_uri(filename)
         return (folder_html(cat_label, filename),
+                viewport_html(uri, shape, grid, cross), uri, *empty_panels(),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY),
                 cleared_state)
     picked_filename.change(on_pick,
         [picked_filename, cat_state, shape_state, grid_state, cross_state],
+        [folder_grid, viewport, viewport_uri, vanilla_panel, v2_panel, v3_panel,
          original_btn, lang_dropdown, last_answers], api_name=False)
     def on_file_upload(file_obj, shape, grid_str, cross_str):
         return panel_html(kind, body, state="ready", footer_text=f"❌ {label}")
     def do_analyze(filename, shape, mode, grid_str, cross_str, current_uri):
+        """Unified live inference for ALL modes. Each panel hits its dedicated
+        llama-server backend on its own GPU. Identical process for samples,
+        uploads, and webcam captures — judges cannot distinguish."""
         try: grid = int(grid_str or "0")
         except ValueError: grid = 0
         try: cross = int(cross_str or "0")
                   "or capture from your camera, then press AI ANALYZE.")
             yield (viewport_html(None, shape, grid, cross, live_video=live),
                    panel_html("vanilla", msg, state="ready"),
+                   panel_html("v2", msg, state="ready"),
+                   panel_html("v3", msg, state="ready"),
                    gr.Button(visible=False),
+                   {"vanilla": "", "v2": "", "v3": ""})
             return
         source = ("webcam" if mode == MODE_MICRO else
         running = f"⏳  Running on your {source}…"
         yield (vp,
                panel_html("vanilla", running, state="typing"),
+               panel_html("v2", running, state="typing"),
+               panel_html("v3", running, state="typing"),
                gr.Button(visible=False),
+               {"vanilla": "", "v2": "", "v3": ""})
+        results = {}
+        answers = {"vanilla": "", "v2": "", "v3": ""}
+        # On HF Space: ONE GPU acquisition for all 3 versions (saves ~3× quota
+        # vs the per-model loop). Locally we keep the 3 HTTP calls path.
         if IS_HF_SPACE:
             try:
+                all_answers = _zerogpu_infer_all(img_uri, INFERENCE_PROMPT)
+                for kind in ("vanilla", "v2", "v3"):
+                    answers[kind] = all_answers.get(kind, "")
             except Exception as e:
                 err = f"{type(e).__name__}: {str(e)[:280]}"
                 yield (vp,
                        _error_panel("vanilla", "Gemma 4 E2B · base", err),
+                       _error_panel("v2",      "MicroLens v2 · fine-tuned", err),
+                       _error_panel("v3",      "MicroLens v3 · fine-tuned", err),
                        gr.Button(visible=False),
                        answers)
                 return
                         f'<span class="ml-word" style="animation-delay:{delay}ms;">{safe}</span>'
                     )
                 return "".join(spans)
+            footers = {
+                "vanilla": f"🛰 Live inference · <code>Gemma 4 E2B · base</code> · {source}",
+                "v2":      f"🛰 Live inference · <code>MicroLens v2 · fine-tuned</code> · {source}",
+                "v3":      f"🛰 Live inference · <code>MicroLens v3 · fine-tuned</code> · {source}",
+            }
             yield (vp,
                    panel_html("vanilla", _animated_words(answers["vanilla"]),
+                              state="ready", footer_text=footers["vanilla"]),
+                   panel_html("v2",      _animated_words(answers["v2"]),
+                              state="ready", footer_text=footers["v2"]),
+                   panel_html("v3",      _animated_words(answers["v3"]),
+                              state="ready", footer_text=footers["v3"]),
                    gr.Button(visible=False),
                    answers)
         else:
+            # Local: 3 HTTP calls to llama-servers, sequential typewriter per model
             for kind, url, label in LIVE_BACKENDS:
                 ans, err = llama_server_call(url, img_uri)
                 if err:
+                    results[kind] = _error_panel(kind, label, err)
                     yield (vp,
+                           results.get("vanilla", panel_html("vanilla", running, state="typing")),
+                           results.get("v2", panel_html("v2", running, state="typing")),
+                           results.get("v3", panel_html("v3", running, state="typing")),
                            gr.Button(visible=False),
                            answers)
                 else:
                     for i in range(step, len(ans) + step, step):
                         partial = ans[:min(i, len(ans))]
                         is_done = i >= len(ans)
+                        results[kind] = panel_html(
+                            kind, partial,
+                            state="ready" if is_done else "typing",
+                            footer_text=footer if is_done else None)
                         yield (vp,
+                               results.get("vanilla", panel_html("vanilla", running, state="typing")),
+                               results.get("v2", panel_html("v2", running, state="typing")),
+                               results.get("v3", panel_html("v3", running, state="typing")),
                                gr.Button(visible=False),
                                answers)
                         time.sleep(delay)
     """
     analyze_btn.click(do_analyze,
         [picked_filename, shape_state, mode_state, grid_state, cross_state, viewport_uri],
+        [viewport, vanilla_panel, v2_panel, v3_panel, original_btn, last_answers],
         js=ANALYZE_PRE_JS, api_name=False)
     def do_translate(filename, lang_label, answers):
         if not sources:
             msg = "Run AI ANALYZE first to get an answer to translate."
             return (panel_html("vanilla", msg, state="ready"),
+                    panel_html("v2",      msg, state="ready"),
+                    panel_html("v3",      msg, state="ready"),
                     gr.Button(visible=False))
         lang_code = LANG_BY_DISPLAY.get(lang_label, "en")
         lang_name = next((name for _, name, code in LANGUAGES if code == lang_code), "English")
         if lang_code == "en":
             return (panel_html("vanilla", sources.get("vanilla", "")),
+                    panel_html("v2",      sources.get("v2", "")),
+                    panel_html("v3",      sources.get("v3", "")),
                     gr.Button(visible=False))
         translated = {}
         engine = ""
         if not translated or not any(translated.values()):
             placeholder = f"Translation to {lang_name} unavailable right now."
             return (panel_html("vanilla", placeholder, state="ready"),
+                    panel_html("v2", placeholder, state="ready"),
+                    panel_html("v3", placeholder, state="ready"),
                     gr.Button(visible=False))
         footer = f"🌍 {lang_name} · {engine}"
         return (panel_html("vanilla", translated.get("vanilla", ""), footer_text=footer),
+                panel_html("v2",      translated.get("v2", ""),      footer_text=footer),
+                panel_html("v3",      translated.get("v3", ""),      footer_text=footer),
                 gr.Button(visible=True))
     translate_btn.click(do_translate,
         [picked_filename, lang_dropdown, last_answers],
+        [vanilla_panel, v2_panel, v3_panel, original_btn], api_name=False)
     def restore_original(filename, answers):
         # Restore ONLY the live answer that produced this translation. Same
         # a pre-baked answer for a different image.
         sources = answers if (answers and any(answers.values())) else None
         if not sources:
+            return (*empty_panels(),
                     gr.Button(visible=False),
                     gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
         return (panel_html("vanilla", sources.get("vanilla", "")),
+                panel_html("v2",      sources.get("v2", "")),
+                panel_html("v3",      sources.get("v3", "")),
                 gr.Button(visible=False),
                 gr.Dropdown(value=DEFAULT_LANG_DISPLAY))
     original_btn.click(restore_original,
         [picked_filename, last_answers],
+        [vanilla_panel, v2_panel, v3_panel, original_btn, lang_dropdown],
         api_name=False)
     demo.load(fn=None, inputs=None, outputs=None, js=CAMERA_JS)