Spaces:

huggingface-projects
/

diffusiongemma-codegen

Running on Zero

App Files Files Community

merve HF Staff commited on Jun 10

Commit

7d99dfd

verified ·

1 Parent(s): e9a192d

Add Gemma Diffusion website builder with google/diffusiongemma-26B-A4B-it

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +39 -7
app.py +365 -0
index.html +343 -0
requirements.txt +6 -0
transformers-5.8.0.dev0+2db78a1296-py3-none-any.whl +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+transformers-5.8.0.dev0+2db78a1296-py3-none-any.whl filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,14 +1,46 @@
 ---
-title: Diffusiongemma Codegen
-emoji: 🦀
-colorFrom: yellow
-colorTo: gray
 sdk: gradio
 sdk_version: 6.17.3
-python_version: '3.13'
 app_file: app.py
 pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Gemma Diffusion Website Builder
+emoji: 🌐
+colorFrom: indigo
+colorTo: purple
 sdk: gradio
 sdk_version: 6.17.3
+python_version: "3.12"
 app_file: app.py
 pinned: false
+short_description: Watch a diffusion LLM write a website live, then tweak it
 ---
+# Gemma Diffusion · Live Website Builder
+A live, side-by-side visualization of **block-diffusion** code generation. Describe a
+website and the model writes a single self-contained HTML document by denoising a canvas
+of random tokens — every token position updates *at once* each step.
+The left pane shows the raw HTML *canvas* diffusing token-by-token (the signature look of
+text diffusion); the right pane renders the page live. Type a follow-up to **tweak in
+place**: the previous page seeds the diffusion's starting canvas (via the model's native
+`canvas_ids` API), so the model edits the existing page instead of regenerating from
+scratch. Changed lines are highlighted, and the preview keeps your scroll position across
+re-renders.
+## Stack
+- **Model**: `google/diffusiongemma-26B-A4B-it` (`DiffusionGemmaForBlockDiffusion`).
+- **Backend**: [`gradio.Server`](https://huggingface.co/blog/introducing-gradio-server)
+  — a FastAPI subclass that provides Gradio's queue + SSE streaming under a custom,
+  hand-written HTML/CSS/JS frontend (`index.html`). The single streaming endpoint
+  `/generate` yields one JSON frame per denoising step.
+- **Hardware**: ZeroGPU (`xlarge`) — the 26B checkpoint needs the full backing card.
+A custom `transformers` wheel providing the DiffusionGemma architecture is bundled in
+this repo and installed at runtime by `app.py` (Spaces installs `requirements.txt` before
+the repo files are copied in, so a local-path wheel can't be referenced there).
+## Configuration
+- `HF_TOKEN` (secret) — read access to the private model repo.
+- `GRADIO_SSR_MODE=false` (variable) — required so the custom `/` route serves
+  `index.html` instead of Gradio's SSR shell.
+- `GDIFF_MODEL_PATH` (optional) — override the model repo id.
+- `GDIFF_GPU_SIZE` (optional) — ZeroGPU slice, defaults to `xlarge`.

app.py ADDED Viewed

	@@ -0,0 +1,365 @@

+"""
+Gemma Diffusion — live website builder (gradio.Server backend + custom frontend).
+ZeroGPU port. `gradio.Server` (a FastAPI subclass) gives us Gradio's queue + SSE
+streaming while we serve our own hand-written HTML/CSS/JS frontend. The single
+streaming endpoint `/generate` runs the block-diffusion model and yields JSON frames
+(one per denoising step) that the frontend renders side-by-side: the raw HTML canvas
+diffusing on the left, the live rendered page on the right.
+ZeroGPU specifics:
+- `import spaces` happens before `torch`.
+- The model is loaded once at module scope with `.to("cuda")` (ZeroGPU registers it).
+- The actual `model.generate` call lives inside the `@spaces.GPU` function `_gpu_stream`;
+  the `gradio.Server` endpoint only marshals picklable CPU tensors in/out of it.
+Refs:
+- https://huggingface.co/blog/introducing-gradio-server
+- https://huggingface.co/docs/hub/spaces-zerogpu
+"""
+import glob
+import os
+import subprocess
+import sys
+# Set before torch is imported (transformers pulls torch in).
+os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True")
+import spaces  # must precede torch so ZeroGPU can patch it
+def _ensure_transformers():
+    """Install the bundled custom DiffusionGemma `transformers` wheel at runtime.
+    Spaces installs `requirements.txt` *before* copying the repo files into the image,
+    so the wheel can't be referenced by local path there. By the time this app runs the
+    file is present in the working directory, so we install it here (only if a stock /
+    no transformers is importable) before importing torch/transformers below.
+    """
+    try:
+        import transformers  # noqa: F401
+        if hasattr(transformers, "DiffusionGemmaForBlockDiffusion") or hasattr(
+            getattr(transformers, "models", object), "diffusion_gemma"
+        ):
+            return
+    except Exception:
+        pass
+    wheels = sorted(glob.glob(os.path.join(os.path.dirname(os.path.abspath(__file__)), "transformers-*.whl")))
+    if not wheels:
+        return
+    print(f"[gdiff] Installing bundled transformers wheel: {os.path.basename(wheels[0])}", flush=True)
+    subprocess.check_call([sys.executable, "-m", "pip", "install", "--no-cache-dir", wheels[0]])
+    import importlib
+    importlib.invalidate_caches()
+_ensure_transformers()
+import json
+import queue as queue_lib
+import re
+import threading
+import time as _time
+import torch
+from fastapi.responses import HTMLResponse
+from gradio import Server
+from transformers import AutoTokenizer, DiffusionGemmaForBlockDiffusion
+from transformers.generation.streamers import BaseStreamer
+HERE = os.path.dirname(os.path.abspath(__file__))
+MODEL_PATH = os.environ.get("GDIFF_MODEL_PATH", "google/diffusiongemma-26B-A4B-it")
+HF_TOKEN = os.environ.get("HF_TOKEN")
+MAX_ITERS_CAP = 120  # hard cap on denoising steps per block
+# ZeroGPU: the 26B checkpoint (~49 GB bf16) needs the full backing card.
+GPU_SIZE = os.environ.get("GDIFF_GPU_SIZE", "xlarge")
+SYSTEM_PROMPT = (
+    "You are an expert front-end web developer with great visual taste. When asked to "
+    "build or change a web page, respond with a SINGLE, complete, self-contained HTML5 "
+    "document. Put all CSS in a <style> tag and any JavaScript in a <script> tag inside "
+    "the document. Do not load external assets. When asked to modify an existing page, "
+    "return the FULL updated HTML document with the change applied. Do not include "
+    "explanations or markdown code fences — output only raw HTML, starting with "
+    "<!DOCTYPE html>."
+)
+_MARKER_RE = re.compile(
+    r"<\|?(?:channel|turn|think|image|audio|video|tool(?:_call|_response)?)\|?>"
+)
+_FENCE_RE = re.compile(r"```(?:html)?\s*(.*?)\s*```", re.DOTALL)
+# --------------------------------------------------------------------------- #
+# Model (loaded once at module scope; ZeroGPU registers .to("cuda") tensors)
+# --------------------------------------------------------------------------- #
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"[gdiff] Loading model from {MODEL_PATH} on {DEVICE} ...", flush=True)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, token=HF_TOKEN)
+model = DiffusionGemmaForBlockDiffusion.from_pretrained(
+    MODEL_PATH,
+    dtype=torch.bfloat16,
+    low_cpu_mem_usage=True,
+    token=HF_TOKEN,
+).to(DEVICE)
+model.eval()
+CANVAS_LEN = model.config.canvas_length
+PAD_ID = tokenizer.pad_token_id or 0
+print(f"[gdiff] Model ready | canvas_length={CANVAS_LEN}", flush=True)
+# Cache of the last *cleaned* page so a follow-up tweak can warm-start in place.
+model._last_clean_html = None
+# --------------------------------------------------------------------------- #
+# Helpers (CPU-only; safe to run in the gradio.Server main process)
+# --------------------------------------------------------------------------- #
+def warm_canvas_from_cache():
+    """Starting canvas (first block) built from the previous *cleaned* page.
+    Returns a CPU tensor (it is pickled across the ZeroGPU process boundary and moved
+    to CUDA inside the GPU worker). We re-tokenize the cleaned HTML rather than reuse
+    raw output tokens so a mangled header can't compound across tweaks.
+    """
+    html = getattr(model, "_last_clean_html", None)
+    if not html:
+        return None
+    ids = tokenizer(html, add_special_tokens=False).input_ids[:CANVAS_LEN]
+    if not ids:
+        return None
+    if len(ids) < CANVAS_LEN:
+        ids = ids + [PAD_ID] * (CANVAS_LEN - len(ids))
+    return torch.tensor(ids, dtype=torch.long).unsqueeze(0)
+def last_assistant_html(history_json: str):
+    try:
+        history = json.loads(history_json) if history_json else []
+    except json.JSONDecodeError:
+        return None
+    for turn in reversed(history):
+        if turn.get("role") == "assistant" and turn.get("content"):
+            return turn["content"]
+    return None
+def clean_text(text: str) -> str:
+    return _MARKER_RE.sub("", text).lstrip()
+def extract_html(text: str) -> str:
+    """Pull a usable HTML document out of the (possibly mangled) model output.
+    Anchor on the first intact structural tag and rebuild whatever the diffused tweak ate
+    off the front, so the result is always a valid document (never quirks mode and never a
+    broken ``DOCTYPE>`` / ``html lang=`` header).
+    """
+    text = clean_text(text)
+    fenced = _FENCE_RE.search(text)
+    if fenced:
+        text = fenced.group(1)
+    lower = text.lower()
+    dt = lower.find("<!doctype")
+    if dt != -1:
+        return text[dt:].strip()
+    h = lower.find("<html")
+    if h != -1:
+        return "<!DOCTYPE html>\n" + text[h:].strip()
+    hd = lower.find("<head")
+    if hd != -1:
+        return '<!DOCTYPE html>\n<html lang="en">\n' + text[hd:].strip()
+    bd = lower.find("<body")
+    if bd != -1:
+        return (
+            '<!DOCTYPE html>\n<html lang="en">\n<head><meta charset="UTF-8">'
+            '<meta name="viewport" content="width=device-width, initial-scale=1.0"></head>\n'
+            + text[bd:].strip()
+        )
+    return text.strip()
+class QueueDiffusionStreamer(BaseStreamer):
+    def __init__(self, tok, q: "queue_lib.Queue"):
+        self.tok = tok
+        self.q = q
+        self.confirmed_ids: list[int] = []
+        self.prompt_skipped = False
+        self.block = 0
+        self.step = 0
+    def _decode(self, ids):
+        return self.tok.decode(ids, skip_special_tokens=True)
+    def put(self, value):
+        ids = value[0].tolist() if value.dim() > 1 else value.tolist()
+        if not self.prompt_skipped:
+            self.prompt_skipped = True
+            return
+        self.confirmed_ids.extend(ids)
+        self.block += 1
+        self.step = 0
+        self.q.put(("commit", self._decode(self.confirmed_ids), self.block, self.step))
+    def put_draft(self, value):
+        self.step += 1
+        ids = value[0].tolist() if value.dim() > 1 else value.tolist()
+        self.q.put(("draft", self._decode(self.confirmed_ids + ids), self.block + 1, self.step))
+    def end(self):
+        self.q.put(("end", self._decode(self.confirmed_ids), self.block, self.step))
+def build_messages(history_json: str, prompt: str):
+    try:
+        history = json.loads(history_json) if history_json else []
+    except json.JSONDecodeError:
+        history = []
+    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+    for turn in history:
+        role = turn.get("role")
+        content = turn.get("content", "")
+        if role in ("user", "assistant") and content:
+            messages.append({"role": role, "content": content})
+    messages.append({"role": "user", "content": prompt})
+    return messages
+# --------------------------------------------------------------------------- #
+# GPU work — runs in a forked ZeroGPU worker process.
+# Inputs/outputs cross the boundary via pickle, so only CPU tensors / plain
+# Python objects go in and out (no CUDA tensors are returned).
+# --------------------------------------------------------------------------- #
+def _estimate_duration(input_ids, max_new_tokens=2048, max_iters=64, full_denoise=False, canvas_ids=None):
+    blocks = max(1, int(max_new_tokens) // max(1, CANVAS_LEN))
+    secs = 30 + blocks * int(max_iters) * 0.3
+    return int(min(120, secs))  # xlarge internally doubles this for the quota check
+@spaces.GPU(duration=_estimate_duration, size=GPU_SIZE)
+def _gpu_stream(input_ids, max_new_tokens, max_iters, full_denoise, canvas_ids):
+    input_ids = input_ids.to(model.device)
+    gen_kwargs = dict(max_new_tokens=int(max_new_tokens), max_denoising_steps=int(max_iters))
+    if full_denoise:
+        gen_kwargs["confidence_threshold"] = 1e-9
+        gen_kwargs["stability_threshold"] = int(max_iters)
+    if canvas_ids is not None:
+        gen_kwargs["canvas_ids"] = canvas_ids.to(model.device)
+    q: "queue_lib.Queue" = queue_lib.Queue()
+    streamer = QueueDiffusionStreamer(tokenizer, q)
+    err = {}
+    def worker():
+        try:
+            with torch.inference_mode():
+                model.generate(input_ids, streamer=streamer, **gen_kwargs)
+        except Exception as exc:  # surface to the endpoint
+            err["msg"] = f"{type(exc).__name__}: {exc}"
+            q.put(("error", str(exc), 0, 0))
+        finally:
+            q.put(("end", "", 0, 0))  # always unblock the consumer
+    thread = threading.Thread(target=worker)
+    thread.start()
+    try:
+        while True:
+            kind, text, block, step = q.get()
+            if kind == "error":
+                yield ("error", err.get("msg", text), 0, 0)
+                return
+            if kind == "end":
+                return
+            yield (kind, text, block, step)
+    finally:
+        thread.join()
+# --------------------------------------------------------------------------- #
+# Server
+# --------------------------------------------------------------------------- #
+app = Server(title="Gemma Diffusion Website Builder")
+@app.api(name="generate", concurrency_limit=1, time_limit=600, stream_every=0.05)
+def generate(
+    prompt: str,
+    history_json: str = "[]",
+    max_new_tokens: int = 2048,
+    max_iters: int = 64,
+    full_denoise: bool = False,
+    anim_delay: float = 0.0,
+    warm_start: bool = True,
+) -> str:
+    """Stream the diffusion generation as JSON frames (one per denoising step).
+    The model writes a self-contained HTML document; the frontend renders it live.
+    """
+    prompt = (prompt or "").strip()
+    if not prompt:
+        yield json.dumps({"kind": "error", "message": "Empty prompt."})
+        return
+    messages = build_messages(history_json, prompt)
+    max_iters = max(1, min(int(max_iters), MAX_ITERS_CAP))
+    # Tweak warm-start: seed the diffusion's first canvas with the previous page's own
+    # tokens (native `canvas_ids` API) so the model edits the existing page in place.
+    is_tweak = bool(last_assistant_html(history_json))
+    canvas_ids = warm_canvas_from_cache() if (warm_start and is_tweak) else None
+    warming = canvas_ids is not None
+    input_ids = tokenizer.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_tensors="pt",
+        return_dict=True,
+    )["input_ids"]
+    last_text = ""
+    for kind, text, block, step in _gpu_stream(
+        input_ids, int(max_new_tokens), max_iters, bool(full_denoise), canvas_ids
+    ):
+        if kind == "error":
+            yield json.dumps({"kind": "error", "message": text})
+            return
+        last_text = text
+        yield json.dumps(
+            {
+                "kind": "draft" if kind == "draft" else "commit",
+                "source": clean_text(text),
+                "block": block,
+                "step": step,
+                "canvas": CANVAS_LEN,
+                "max_iters": max_iters,
+                "warming": warming,
+            }
+        )
+        if anim_delay and kind == "draft":
+            _time.sleep(float(anim_delay))
+    final_source = extract_html(last_text)
+    # Cache the *cleaned* output so the next tweak warm-starts from a valid header.
+    if final_source.strip():
+        model._last_clean_html = final_source
+    yield json.dumps({"kind": "done", "source": final_source})
+@app.get("/", response_class=HTMLResponse)
+async def homepage():
+    with open(os.path.join(HERE, "index.html"), "r", encoding="utf-8") as f:
+        return f.read()
+# HF Spaces' gradio runtime looks for a top-level `demo` (or `app`) to launch.
+demo = app
+if __name__ == "__main__":
+    app.launch(
+        server_name=os.environ.get("GDIFF_HOST", "0.0.0.0"),
+        server_port=int(os.environ.get("GDIFF_PORT", "7860")),
+        show_error=True,
+    )

index.html ADDED Viewed

	@@ -0,0 +1,343 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<title>DiffusionGemma · Live Website Builder</title>
+<style>
+  :root {
+    --bg: #0b0e14;
+    --panel: #121722;
+    --panel-2: #0f141d;
+    --border: #222a38;
+    --text: #e6edf3;
+    --muted: #8b97a7;
+    --accent: #7c5cff;
+    --accent-2: #18c29c;
+    --amber: #f5c451;
+    --green: #2ea043;
+  }
+  * { box-sizing: border-box; }
+  html, body { margin: 0; height: 100%; }
+  body {
+    background: radial-gradient(1200px 600px at 70% -10%, #1a2030 0%, var(--bg) 55%);
+    color: var(--text);
+    font-family: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto, sans-serif;
+    display: flex; flex-direction: column; height: 100vh; overflow: hidden;
+  }
+  header {
+    padding: 14px 22px; border-bottom: 1px solid var(--border);
+    display: flex; align-items: center; gap: 14px; flex: 0 0 auto;
+  }
+  header .logo { font-size: 22px; }
+  header h1 { font-size: 16px; margin: 0; font-weight: 650; letter-spacing: .2px; }
+  header p { margin: 0; color: var(--muted); font-size: 12.5px; }
+  .pill {
+    margin-left: auto; font-size: 12px; color: var(--muted);
+    border: 1px solid var(--border); border-radius: 999px; padding: 5px 12px;
+    display: flex; align-items: center; gap: 8px; white-space: nowrap;
+  }
+  .dot { width: 8px; height: 8px; border-radius: 50%; background: var(--muted); }
+  .dot.live { background: var(--amber); box-shadow: 0 0 8px var(--amber); }
+  .dot.done { background: var(--green); box-shadow: 0 0 8px var(--green); }
+  .dot.err { background: #f85149; box-shadow: 0 0 8px #f85149; }
+  main { flex: 1 1 auto; display: grid; grid-template-columns: 1fr 1fr; gap: 14px; padding: 14px 18px; min-height: 0; }
+  .panel { background: var(--panel); border: 1px solid var(--border); border-radius: 12px; display: flex; flex-direction: column; min-height: 0; overflow: hidden; }
+  .panel .cap { padding: 9px 14px; border-bottom: 1px solid var(--border); font-size: 12.5px; font-weight: 600; color: var(--muted); display: flex; align-items: center; gap: 8px; }
+  .panel .cap .sub { font-weight: 400; color: #5d6b7a; }
+  /* Code / diffusion view */
+  #code { flex: 1 1 auto; overflow: auto; margin: 0; padding: 12px 0; font-family: "SF Mono", ui-monospace, "JetBrains Mono", Menlo, Consolas, monospace; font-size: 12px; line-height: 1.55; background: var(--panel-2); }
+  #code .ln { padding: 0 14px; white-space: pre-wrap; word-break: break-word; min-height: 1.55em; border-left: 3px solid transparent; }
+  #code .ln.live { animation: flash .6s ease-out; background: rgba(245,196,81,.05); }
+  #code .ln.diff { background: rgba(46,160,67,.13); border-left-color: var(--green); }
+  @keyframes flash {
+    0% { background: rgba(245,196,81,.42); }
+    100% { background: rgba(245,196,81,.05); }
+  }
+  #code::-webkit-scrollbar, .scroll::-webkit-scrollbar { width: 9px; height: 9px; }
+  #code::-webkit-scrollbar-thumb, .scroll::-webkit-scrollbar-thumb { background: #2a3344; border-radius: 8px; }
+  /* Website preview */
+  #preview { flex: 1 1 auto; border: 0; width: 100%; background: #fff; }
+  /* Bottom dock */
+  footer { flex: 0 0 auto; border-top: 1px solid var(--border); padding: 12px 18px; background: var(--panel-2); }
+  .row { display: flex; gap: 12px; align-items: flex-start; }
+  textarea#prompt {
+    flex: 1 1 auto; resize: none; height: 72px; background: var(--panel); color: var(--text);
+    border: 1px solid var(--border); border-radius: 10px; padding: 11px 13px; font-size: 14px; font-family: inherit;
+  }
+  textarea#prompt:focus { outline: none; border-color: var(--accent); }
+  .btns { display: flex; flex-direction: column; gap: 8px; width: 150px; }
+  button { font-family: inherit; font-size: 13.5px; font-weight: 600; border-radius: 10px; padding: 9px 12px; cursor: pointer; border: 1px solid var(--border); }
+  button.primary { background: linear-gradient(180deg, #8a6bff, #6b48f0); color: #fff; border: 0; }
+  button.primary:disabled { opacity: .5; cursor: not-allowed; }
+  button.ghost { background: transparent; color: var(--muted); }
+  button.ghost:hover:not(:disabled) { color: var(--text); border-color: #36405230; }
+  button.ghost:disabled { opacity: .4; cursor: not-allowed; }
+  .meta { display: flex; gap: 16px; align-items: center; flex-wrap: wrap; margin-top: 11px; color: var(--muted); font-size: 12px; }
+  .meta label { display: flex; align-items: center; gap: 7px; white-space: nowrap; }
+  .meta input[type="range"] { accent-color: var(--accent); width: 120px; }
+  .meta input[type="checkbox"] { accent-color: var(--accent); width: 15px; height: 15px; }
+  .meta .val { color: var(--text); font-variant-numeric: tabular-nums; min-width: 30px; }
+  .chips { display: flex; gap: 8px; flex-wrap: wrap; margin-top: 11px; }
+  .chip { font-size: 12px; color: var(--muted); border: 1px solid var(--border); border-radius: 999px; padding: 5px 11px; cursor: pointer; background: transparent; }
+  .chip:hover { color: var(--text); border-color: var(--accent); }
+  .history { margin-top: 10px; display: flex; gap: 8px; flex-wrap: wrap; max-height: 46px; overflow: auto; }
+  .turn { font-size: 11.5px; color: var(--muted); border: 1px solid var(--border); border-radius: 8px; padding: 3px 9px; }
+  .turn b { color: var(--accent-2); }
+</style>
+</head>
+<body>
+  <header>
+    <span class="logo">🌫️→🌐</span>
+    <div>
+      <h1>DiffusionGemma · Live Website Builder</h1>
+      <p>Describe a website and a block-diffusion LLM writes the HTML by denoising — every token updates at once. Watch the raw canvas take shape (left) while it renders into a live page (right), then send follow-up prompts to tweak it.</p>
+    </div>
+    <span class="pill"><span class="dot" id="statusDot"></span><span id="statusText">idle</span></span>
+  </header>
+  <main>
+    <section class="panel">
+      <div class="cap">🧠 Model's view — diffusion canvas <span class="sub" id="capInfo"></span></div>
+      <div id="code" class="scroll"></div>
+    </section>
+    <section class="panel">
+      <div class="cap">🌐 Live website</div>
+      <iframe id="preview" sandbox="allow-scripts"></iframe>
+    </section>
+  </main>
+  <footer>
+    <div class="row">
+      <textarea id="prompt" placeholder="Describe a website…  e.g. 'a landing page for a coffee shop with a hero, menu, and footer'   then tweak: 'make the header dark', 'add a contact form'"></textarea>
+      <div class="btns">
+        <button class="primary" id="buildBtn">Build / Tweak</button>
+        <button class="ghost" id="resetBtn">Reset</button>
+      </div>
+    </div>
+    <div class="chips" id="chips"></div>
+    <div class="meta">
+      <label>tokens <input type="range" id="maxTokens" min="2048" max="4096" step="256" value="2048"><span class="val" id="maxTokensV">2048</span></label>
+      <label>iterations/block <input type="range" id="maxIters" min="8" max="120" step="8" value="64"><span class="val" id="maxItersV">64</span></label>
+      <label>anim delay <input type="range" id="delay" min="0" max="0.3" step="0.02" value="0"><span class="val" id="delayV">0.0s</span></label>
+      <label><input type="checkbox" id="fullDenoise"> run all denoising steps (no early stop)</label>
+      <label><input type="checkbox" id="warmStart" checked> tweak in place (diffuse from current page, not noise)</label>
+      <span class="history" id="history"></span>
+    </div>
+  </footer>
+<script type="module">
+import { Client } from "https://cdn.jsdelivr.net/npm/@gradio/client/dist/index.min.js";
+const $ = (id) => document.getElementById(id);
+const codeEl = $("code"), preview = $("preview");
+const statusDot = $("statusDot"), statusText = $("statusText"), capInfo = $("capInfo");
+const buildBtn = $("buildBtn"), resetBtn = $("resetBtn"), promptEl = $("prompt");
+let client = null;
+let busy = false;
+let messages = [];          // [{role, content}] confirmed conversation
+let prevFrameLines = [];    // lines shown on the previous streaming frame (live churn diff)
+let lastFinalLines = [];    // lines of the previous round's final HTML (tweak diff)
+let previewScrollY = 0;     // remembered scroll position of the rendered page (see renderPreview)
+const EXAMPLES = [
+  "A bold landing page for a startup called 'Nimbus' that sells AI weather forecasting, with a hero section, three feature cards, and a call-to-action button.",
+  "A cozy personal blog homepage with a warm color palette, a header, an about section, and a list of three recent posts.",
+  "A neon synthwave portfolio page for a music producer with an animated gradient background.",
+  "A clean pricing page with three tiers (Free, Pro, Team), a feature comparison, and a FAQ.",
+  "A restaurant landing page with a hero image area, a menu grid, opening hours, and a reservation button.",
+  "A sleek product page for wireless headphones with specs, an image placeholder, and an add-to-cart bar.",
+  "A dark-mode dashboard mockup with a sidebar, four stat cards, and a placeholder chart.",
+  "A playful 'coming soon' page with a countdown vibe, an email signup box, and floating shapes.",
+];
+// The rendered page reports its scroll position back to us via postMessage so we can
+// restore it after each re-render (the iframe reloads every frame as the page diffuses,
+// which would otherwise snap the user back to the top mid-stream).
+window.addEventListener("message", (e) => {
+  if (e.data && typeof e.data.__gdiffScrollY === "number") previewScrollY = e.data.__gdiffScrollY;
+});
+// ---------- helpers ----------
+function esc(s) { return s.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;"); }
+function setStatus(kind, text) {
+  statusDot.className = "dot" + (kind ? " " + kind : "");
+  statusText.textContent = text;
+}
+// Longest-common-subsequence over lines -> set of indices in `b` that are unchanged.
+function unchangedSet(a, b) {
+  const n = a.length, m = b.length;
+  const dp = Array.from({ length: n + 1 }, () => new Int32Array(m + 1));
+  for (let i = n - 1; i >= 0; i--)
+    for (let j = m - 1; j >= 0; j--)
+      dp[i][j] = a[i] === b[j] ? dp[i + 1][j + 1] + 1 : Math.max(dp[i + 1][j], dp[i][j + 1]);
+  const keep = new Set();
+  let i = 0, j = 0;
+  while (i < n && j < m) {
+    if (a[i] === b[j]) { keep.add(j); i++; j++; }
+    else if (dp[i + 1][j] >= dp[i][j + 1]) i++; else j++;
+  }
+  return keep;
+}
+// Render the source with per-line highlight classes.
+function renderCode(source, { liveAgainst = null, diffAgainst = null } = {}) {
+  const lines = source.split("\n");
+  const liveKeep = liveAgainst ? unchangedSet(liveAgainst, lines) : null;
+  const diffKeep = diffAgainst ? unchangedSet(diffAgainst, lines) : null;
+  const html = lines.map((ln, idx) => {
+    let cls = "ln";
+    if (diffKeep && !diffKeep.has(idx)) cls += " diff";       // persistent tweak diff
+    else if (liveKeep && !liveKeep.has(idx)) cls += " live";  // transient churn flash
+    return `<div class="${cls}">${esc(ln) || "&nbsp;"}</div>`;
+  }).join("");
+  codeEl.innerHTML = html;
+  return lines;
+}
+// Repair a (possibly front-mangled) document so the preview never breaks. Warm-start
+// diffusion often eats the leading chars of the header ("<!DOCTYPE html>" -> "DOCTYPE>",
+// "<html" -> "<head"); rebuild whatever was eaten by anchoring on the first intact
+// structural tag. Mirrors the server's extract_html.
+function normalizeHtml(src) {
+  const lower = src.toLowerCase();
+  const dt = lower.indexOf("<!doctype");
+  if (dt !== -1) return src.slice(dt);
+  const h = lower.indexOf("<html");
+  if (h !== -1) return "<!DOCTYPE html>\n" + src.slice(h);
+  const hd = lower.indexOf("<head");
+  if (hd !== -1) return '<!DOCTYPE html>\n<html lang="en">\n' + src.slice(hd);
+  const bd = lower.indexOf("<body");
+  if (bd !== -1) return '<!DOCTYPE html>\n<html lang="en">\n<head><meta charset="UTF-8"></head>\n' + src.slice(bd);
+  return src;
+}
+// Render the model's HTML into the live preview, repairing the header and preserving the
+// user's scroll position across re-renders (each frame reloads the iframe). A tiny script
+// is injected before </body>: it restores the remembered scroll and reports new positions.
+function renderPreview(source) {
+  if (!source || !source.trim()) { preview.srcdoc = ""; previewScrollY = 0; return; }
+  source = normalizeHtml(source);
+  const y = Math.round(previewScrollY);
+  const inject =
+    "<script>(function(){var y=" + y + ";" +
+    "function r(){window.scrollTo(0,y);}" +
+    "r();requestAnimationFrame(r);window.addEventListener('load',r);" +
+    "window.addEventListener('scroll',function(){" +
+    "parent.postMessage({__gdiffScrollY:window.scrollY||document.documentElement.scrollTop||0},'*');" +
+    "},{passive:true});})();<\/script>";
+  const idx = source.toLowerCase().lastIndexOf("</body>");
+  preview.srcdoc = idx !== -1 ? source.slice(0, idx) + inject + source.slice(idx) : source + inject;
+}
+// ---------- generation ----------
+async function ensureClient() {
+  if (!client) { setStatus("", "connecting…"); client = await Client.connect(window.location.origin); }
+  return client;
+}
+async function run() {
+  if (busy) return;
+  const prompt = promptEl.value.trim();
+  if (!prompt) { promptEl.focus(); return; }
+  busy = true; buildBtn.disabled = true;
+  const isTweak = messages.length > 0;
+  setStatus("live", isTweak ? "tweaking…" : "diffusing…");
+  prevFrameLines = [];
+  if (!isTweak) previewScrollY = 0;  // fresh build starts at the top
+  try {
+    const c = await ensureClient();
+    const payload = {
+      prompt,
+      history_json: JSON.stringify(messages),
+      max_new_tokens: parseInt($("maxTokens").value, 10),
+      max_iters: parseInt($("maxIters").value, 10),
+      full_denoise: $("fullDenoise").checked,
+      anim_delay: parseFloat($("delay").value),
+      warm_start: $("warmStart").checked,
+    };
+    let finalSource = "";
+    const sub = c.submit("/generate", payload);
+    for await (const ev of sub) {
+      if (ev.type === "data") {
+        const frame = JSON.parse(ev.data[0]);
+        if (frame.kind === "error") { setStatus("err", "error"); renderCode("/* " + frame.message + " */"); break; }
+        if (frame.kind === "done") {
+          finalSource = frame.source;
+          // Persistent green highlight of what changed vs the previous round.
+          prevFrameLines = renderCode(finalSource, { diffAgainst: isTweak ? lastFinalLines : null });
+          renderPreview(finalSource);
+          setStatus("done", "done");
+          continue;
+        }
+        // draft / commit frame: both the code panel and the live page churn every frame.
+        const src = frame.source || "";
+        prevFrameLines = renderCode(src, { liveAgainst: prevFrameLines });
+        renderPreview(src);
+        capInfo.textContent = `block ${frame.block} · step ${frame.step}/${frame.max_iters} · ${frame.canvas} tokens update simultaneously`;
+        setStatus("live", `${frame.kind === "draft" ? "diffusing" : "committed"} · block ${frame.block} · step ${frame.step}`);
+      } else if (ev.type === "status" && ev.stage === "error") {
+        setStatus("err", "error"); break;
+      }
+    }
+    if (finalSource) {
+      messages.push({ role: "user", content: prompt });
+      messages.push({ role: "assistant", content: finalSource });
+      lastFinalLines = finalSource.split("\n");
+      promptEl.value = "";
+      renderHistory();
+    }
+  } catch (e) {
+    setStatus("err", "error");
+    renderCode("/* connection error: " + (e && e.message ? e.message : e) + " */");
+  } finally {
+    busy = false; buildBtn.disabled = false;
+  }
+}
+function renderHistory() {
+  const turns = messages.filter((m) => m.role === "user");
+  $("history").innerHTML = turns.map((m, i) =>
+    `<span class="turn"><b>${i === 0 ? "build" : "tweak " + i}</b> · ${esc(m.content.slice(0, 40))}${m.content.length > 40 ? "…" : ""}</span>`
+  ).join("");
+}
+function reset() {
+  messages = []; prevFrameLines = []; lastFinalLines = []; previewScrollY = 0;
+  codeEl.innerHTML = ""; renderPreview(""); $("history").innerHTML = "";
+  capInfo.textContent = ""; setStatus("", "idle"); promptEl.value = "";
+}
+// ---------- wiring ----------
+buildBtn.addEventListener("click", run);
+resetBtn.addEventListener("click", reset);
+promptEl.addEventListener("keydown", (e) => {
+  if (e.key === "Enter" && (e.metaKey || e.ctrlKey)) { e.preventDefault(); run(); }
+});
+for (const [id, fmt] of [["maxTokens", (v) => v], ["maxIters", (v) => v], ["delay", (v) => (+v).toFixed(1) + "s"]]) {
+  const el = $(id), out = $(id + "V");
+  el.addEventListener("input", () => (out.textContent = fmt(el.value)));
+}
+$("chips").innerHTML = EXAMPLES.map((e, i) => `<button class="chip" data-i="${i}">${esc(e.slice(0, 42))}…</button>`).join("");
+$("chips").addEventListener("click", (e) => {
+  const i = e.target.getAttribute("data-i");
+  if (i !== null) { promptEl.value = EXAMPLES[+i]; promptEl.focus(); }
+});
+renderPreview("");
+setStatus("", "idle");
+</script>
+</body>
+</html>

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+# NOTE: the custom DiffusionGemma `transformers` wheel is bundled in this repo and
+# installed at runtime from app.py — Spaces installs requirements.txt *before* the repo
+# files are copied in, so a local-path wheel reference here can't be found at build time.
+accelerate
+sentencepiece
+hf_xet

transformers-5.8.0.dev0+2db78a1296-py3-none-any.whl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd08fd82069ff1d3a1b49fbb103564295f20429d2ffb68a11cc4f760bb993f9b
+size 11875344