import os import gc import re import ast import json import base64 import random import uuid import time from io import BytesIO from threading import Thread import gradio as gr import spaces import torch import numpy as np from PIL import Image, ImageOps import cv2 from transformers import ( Qwen2_5_VLForConditionalGeneration, Qwen2VLForConditionalGeneration, AutoProcessor, TextIteratorStreamer, ) MAX_MAX_NEW_TOKENS = 4096 DEFAULT_MAX_NEW_TOKENS = 1024 MAX_INPUT_TOKEN_LENGTH = int(os.getenv("MAX_INPUT_TOKEN_LENGTH", "4096")) device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print("Using device:", device) MODEL_ID_N = "prithivMLmods/DeepCaption-VLA-7B" processor_n = AutoProcessor.from_pretrained(MODEL_ID_N, trust_remote_code=True) model_n = Qwen2_5_VLForConditionalGeneration.from_pretrained( MODEL_ID_N, attn_implementation="kernels-community/flash-attn2", trust_remote_code=True, torch_dtype=torch.float16 ).to(device).eval() MODEL_ID_M = "Skywork/SkyCaptioner-V1" processor_m = AutoProcessor.from_pretrained(MODEL_ID_M, trust_remote_code=True) model_m = Qwen2_5_VLForConditionalGeneration.from_pretrained( MODEL_ID_M, attn_implementation="kernels-community/flash-attn2", trust_remote_code=True, torch_dtype=torch.float16 ).to(device).eval() MODEL_ID_Z = "remyxai/SpaceThinker-Qwen2.5VL-3B" processor_z = AutoProcessor.from_pretrained(MODEL_ID_Z, trust_remote_code=True) model_z = Qwen2_5_VLForConditionalGeneration.from_pretrained( MODEL_ID_Z, attn_implementation="kernels-community/flash-attn2", trust_remote_code=True, torch_dtype=torch.float16 ).to(device).eval() MODEL_ID_K = "prithivMLmods/coreOCR-7B-050325-preview" processor_k = AutoProcessor.from_pretrained(MODEL_ID_K, trust_remote_code=True) model_k = Qwen2VLForConditionalGeneration.from_pretrained( MODEL_ID_K, attn_implementation="kernels-community/flash-attn2", trust_remote_code=True, torch_dtype=torch.float16 ).to(device).eval() MODEL_ID_Y = "remyxai/SpaceOm" processor_y = AutoProcessor.from_pretrained(MODEL_ID_Y, trust_remote_code=True) model_y = Qwen2_5_VLForConditionalGeneration.from_pretrained( MODEL_ID_Y, attn_implementation="kernels-community/flash-attn2", trust_remote_code=True, torch_dtype=torch.float16 ).to(device).eval() MODEL_MAP = { "DeepCaption-VLA-7B": (processor_n, model_n), "SkyCaptioner-V1": (processor_m, model_m), "SpaceThinker-3B": (processor_z, model_z), "coreOCR-7B-050325-preview": (processor_k, model_k), "SpaceOm-3B": (processor_y, model_y), } MODEL_CHOICES = list(MODEL_MAP.keys()) image_examples = [ {"query": "type out the messy hand-writing as accurately as you can.", "image": "images/1.jpg", "model": "coreOCR-7B-050325-preview"}, {"query": "count the number of birds and explain the scene in detail.", "image": "images/2.jpeg", "model": "DeepCaption-VLA-7B"}, {"query": "how far is the Goal from the penalty taker in this image?.", "image": "images/3.png", "model": "SpaceThinker-3B"}, {"query": "approximately how many meters apart are the chair and bookshelf?.", "image": "images/4.png", "model": "SkyCaptioner-V1"}, {"query": "how far is the man in the red hat from the pallet of boxes in feet?.", "image": "images/5.jpg", "model": "SpaceOm-3B"}, ] video_examples = [ {"query": "give the highlights of the movie scene video.", "video": "videos/1.mp4", "model": "DeepCaption-VLA-7B"}, {"query": "explain the advertisement in detail.", "video": "videos/2.mp4", "model": "SkyCaptioner-V1"}, ] def pil_to_data_url(img: Image.Image, fmt="PNG"): buf = BytesIO() img.save(buf, format=fmt) data = base64.b64encode(buf.getvalue()).decode() mime = "image/png" if fmt.upper() == "PNG" else "image/jpeg" return f"data:{mime};base64,{data}" def file_to_data_url(path): if not os.path.exists(path): return "" ext = path.rsplit(".", 1)[-1].lower() mime = { "jpg": "image/jpeg", "jpeg": "image/jpeg", "png": "image/png", "webp": "image/webp", "mp4": "video/mp4", "mov": "video/quicktime", "webm": "video/webm", "mkv": "video/x-matroska", }.get(ext, "application/octet-stream") with open(path, "rb") as f: data = base64.b64encode(f.read()).decode() return f"data:{mime};base64,{data}" def make_thumb_b64(path, max_dim=240): try: img = Image.open(path).convert("RGB") img.thumbnail((max_dim, max_dim)) return pil_to_data_url(img, "JPEG") except Exception as e: print("Thumbnail error:", e) return "" def make_video_thumb_b64(path, max_dim=240): try: cap = cv2.VideoCapture(path) total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) target = max(0, total_frames // 2) cap.set(cv2.CAP_PROP_POS_FRAMES, target) success, frame = cap.read() cap.release() if not success: return "" frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = Image.fromarray(frame).convert("RGB") img.thumbnail((max_dim, max_dim)) return pil_to_data_url(img, "JPEG") except Exception as e: print("Video thumbnail error:", e) return "" def build_example_cards_html(): cards = "" for i, ex in enumerate(image_examples): thumb = make_thumb_b64(ex["image"]) prompt_short = ex["query"][:72] + ("..." if len(ex["query"]) > 72 else "") cards += f"""
{"" if thumb else "
Preview
"}
{ex["model"]} IMAGE
{prompt_short}
""" for i, ex in enumerate(video_examples): thumb = make_video_thumb_b64(ex["video"]) prompt_short = ex["query"][:72] + ("..." if len(ex["query"]) > 72 else "") cards += f"""
{"" if thumb else "
Video
"}
{ex["model"]} VIDEO
{prompt_short}
""" return cards EXAMPLE_CARDS_HTML = build_example_cards_html() def load_example_data(kind, idx_str): try: idx = int(float(idx_str)) except Exception: return json.dumps({"status": "error", "message": "Invalid example index"}) if kind == "image": if idx < 0 or idx >= len(image_examples): return json.dumps({"status": "error", "message": "Example index out of range"}) ex = image_examples[idx] img_b64 = file_to_data_url(ex["image"]) if not img_b64: return json.dumps({"status": "error", "message": "Could not load example image"}) return json.dumps({ "status": "ok", "kind": "image", "query": ex["query"], "file": img_b64, "model": ex["model"], "name": os.path.basename(ex["image"]), }) if kind == "video": if idx < 0 or idx >= len(video_examples): return json.dumps({"status": "error", "message": "Example index out of range"}) ex = video_examples[idx] vid_b64 = file_to_data_url(ex["video"]) if not vid_b64: return json.dumps({"status": "error", "message": "Could not load example video"}) return json.dumps({ "status": "ok", "kind": "video", "query": ex["query"], "file": vid_b64, "model": ex["model"], "name": os.path.basename(ex["video"]), }) return json.dumps({"status": "error", "message": "Invalid example kind"}) def downsample_video(video_path): vidcap = cv2.VideoCapture(video_path) total_frames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT)) fps = vidcap.get(cv2.CAP_PROP_FPS) frames = [] if total_frames <= 0 or fps <= 0: vidcap.release() return frames frame_indices = np.linspace(0, total_frames - 1, min(total_frames, 10), dtype=int) for i in frame_indices: vidcap.set(cv2.CAP_PROP_POS_FRAMES, int(i)) success, image = vidcap.read() if success: image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) pil_image = Image.fromarray(image) timestamp = round(i / fps, 2) frames.append((pil_image, timestamp)) vidcap.release() return frames def calc_timeout_image(model_name, text, image, max_new_tokens, temperature, top_p, top_k, repetition_penalty, gpu_timeout): try: return int(gpu_timeout) except Exception: return 60 def calc_timeout_video(model_name, text, video_path, max_new_tokens, temperature, top_p, top_k, repetition_penalty, gpu_timeout): try: return int(gpu_timeout) except Exception: return 60 @spaces.GPU(duration=calc_timeout_image) def generate_image(model_name, text, image, max_new_tokens=1024, temperature=0.6, top_p=0.9, top_k=50, repetition_penalty=1.2, gpu_timeout=60): if not model_name or model_name not in MODEL_MAP: raise gr.Error("Please select a valid model.") if image is None: raise gr.Error("Please upload an image.") if not text or not str(text).strip(): raise gr.Error("Please enter your vision/query instruction.") if len(str(text)) > MAX_INPUT_TOKEN_LENGTH * 8: raise gr.Error("Query is too long. Please shorten your input.") processor, model = MODEL_MAP[model_name] messages = [{ "role": "user", "content": [ {"type": "image"}, {"type": "text", "text": text}, ] }] prompt_full = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = processor( text=[prompt_full], images=[image], return_tensors="pt", padding=True, truncation=True, max_length=MAX_INPUT_TOKEN_LENGTH ).to(device) streamer = TextIteratorStreamer(processor, skip_prompt=True, skip_special_tokens=True) generation_kwargs = { **inputs, "streamer": streamer, "max_new_tokens": int(max_new_tokens), "temperature": float(temperature), "top_p": float(top_p), "top_k": int(top_k), "repetition_penalty": float(repetition_penalty), "do_sample": True, } thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() buffer = "" for new_text in streamer: buffer += new_text.replace("<|im_end|>", "") time.sleep(0.01) yield buffer gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() @spaces.GPU(duration=calc_timeout_video) def generate_video(model_name, text, video_path, max_new_tokens=1024, temperature=0.6, top_p=0.9, top_k=50, repetition_penalty=1.2, gpu_timeout=90): if not model_name or model_name not in MODEL_MAP: raise gr.Error("Please select a valid model.") if not video_path: raise gr.Error("Please upload a video.") if not text or not str(text).strip(): raise gr.Error("Please enter your vision/query instruction.") if len(str(text)) > MAX_INPUT_TOKEN_LENGTH * 8: raise gr.Error("Query is too long. Please shorten your input.") processor, model = MODEL_MAP[model_name] frames = downsample_video(video_path) if not frames: raise gr.Error("Failed to read video frames.") messages = [ {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]}, {"role": "user", "content": [{"type": "text", "text": text}]} ] for frame in frames: image, timestamp = frame messages[1]["content"].append({"type": "text", "text": f"Frame {timestamp}:"}) messages[1]["content"].append({"type": "image", "image": image}) inputs = processor.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", truncation=True, max_length=MAX_INPUT_TOKEN_LENGTH ).to(device) streamer = TextIteratorStreamer(processor, skip_prompt=True, skip_special_tokens=True) generation_kwargs = { **inputs, "streamer": streamer, "max_new_tokens": int(max_new_tokens), "do_sample": True, "temperature": float(temperature), "top_p": float(top_p), "top_k": int(top_k), "repetition_penalty": float(repetition_penalty), } thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() buffer = "" for new_text in streamer: buffer += new_text.replace("<|im_end|>", "") time.sleep(0.01) yield buffer gc.collect() if torch.cuda.is_available(): torch.cuda.empty_cache() def noop(): return None css = r""" @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800&family=JetBrains+Mono:wght@400;500;600&display=swap'); *{box-sizing:border-box;margin:0;padding:0} html,body{height:100%;overflow-x:hidden} body,.gradio-container{ background:#0b1020!important; font-family:'Inter',system-ui,-apple-system,sans-serif!important; font-size:14px!important;color:#e4e4e7!important;min-height:100vh;overflow-x:hidden; } .dark body,.dark .gradio-container{background:#0b1020!important;color:#e4e4e7!important} footer{display:none!important} .hidden-input{display:none!important;height:0!important;overflow:hidden!important;margin:0!important;padding:0!important} #gradio-run-btn,#example-load-btn{ position:absolute!important;left:-9999px!important;top:-9999px!important; width:1px!important;height:1px!important;opacity:0.01!important; pointer-events:none!important;overflow:hidden!important; } .app-shell{ background:#11182d;border:1px solid #1e2b52;border-radius:16px; margin:12px auto;max-width:1400px;overflow:hidden; box-shadow:0 25px 50px -12px rgba(0,0,0,.6),0 0 0 1px rgba(255,255,255,.03); } .app-header{ background:linear-gradient(135deg,#11182d,#152042);border-bottom:1px solid #1e2b52; padding:14px 24px;display:flex;align-items:center;justify-content:space-between;flex-wrap:wrap;gap:12px; } .app-header-left{display:flex;align-items:center;gap:12px} .app-logo{ width:38px;height:38px;background:linear-gradient(135deg,#0000FF,#2e5bff,#6d8dff); border-radius:10px;display:flex;align-items:center;justify-content:center; box-shadow:0 4px 12px rgba(0,0,255,.35); } .app-logo svg{width:22px;height:22px;fill:#fff;flex-shrink:0} .app-title{ font-size:18px;font-weight:700;background:linear-gradient(135deg,#f5f5f5,#b8c5ff); -webkit-background-clip:text;-webkit-text-fill-color:transparent;letter-spacing:-.3px; } .app-badge{ font-size:11px;font-weight:600;padding:3px 10px;border-radius:20px; background:rgba(0,0,255,.12);color:#8ea2ff;border:1px solid rgba(0,0,255,.25);letter-spacing:.3px; } .app-badge.fast{background:rgba(46,91,255,.10);color:#93a7ff;border:1px solid rgba(46,91,255,.22)} .mode-tabs-bar,.model-tabs-bar{ background:#11182d;border-bottom:1px solid #1e2b52;padding:10px 16px; display:flex;gap:8px;align-items:center;flex-wrap:wrap; } .model-tab,.mode-tab{ display:inline-flex;align-items:center;justify-content:center;gap:6px; min-width:32px;height:34px;background:transparent;border:1px solid #243669; border-radius:999px;cursor:pointer;font-size:12px;font-weight:600;padding:0 12px; color:#ffffff!important;transition:all .15s ease; } .model-tab:hover,.mode-tab:hover{background:rgba(0,0,255,.12);border-color:rgba(0,0,255,.35)} .model-tab.active,.mode-tab.active{background:rgba(0,0,255,.22);border-color:#0000FF;color:#fff!important;box-shadow:0 0 0 2px rgba(0,0,255,.10)} .model-tab-label,.mode-tab-label{font-size:12px;color:#ffffff!important;font-weight:600} .app-main-row{display:flex;gap:0;flex:1;overflow:hidden} .app-main-left{flex:1;display:flex;flex-direction:column;min-width:0;border-right:1px solid #1e2b52} .app-main-right{width:470px;display:flex;flex-direction:column;flex-shrink:0;background:#11182d} #media-drop-zone{ position:relative;background:#08101d;height:440px;min-height:440px;max-height:440px; overflow:hidden; } #media-drop-zone.drag-over{outline:2px solid #0000FF;outline-offset:-2px;background:rgba(0,0,255,.04)} .upload-prompt-modern{ position:absolute;inset:0;display:flex;align-items:center;justify-content:center; padding:20px;z-index:20;overflow:hidden; } .upload-click-area{ display:flex;flex-direction:column;align-items:center;justify-content:center; cursor:pointer;padding:28px 36px;max-width:92%;max-height:92%; border:2px dashed #32446d;border-radius:16px; background:rgba(0,0,255,.03);transition:all .2s ease;gap:8px;text-align:center; overflow:hidden; } .upload-click-area:hover{background:rgba(0,0,255,.08);border-color:#0000FF;transform:scale(1.02)} .upload-click-area:active{background:rgba(0,0,255,.12);transform:scale(.99)} .upload-click-area svg{width:86px;height:86px;max-width:100%;flex-shrink:0} .upload-main-text{color:#a1a1aa;font-size:14px;font-weight:600;margin-top:4px} .upload-sub-text{color:#71717a;font-size:12px} .single-preview-wrap{ width:100%;height:100%;display:none;align-items:center;justify-content:center;padding:16px; overflow:hidden; } .single-preview-card{ width:100%;height:100%;max-width:100%;max-height:100%;border-radius:14px; overflow:hidden;border:1px solid #1e2b52;background:#0d1425; display:flex;align-items:center;justify-content:center;position:relative; } .single-preview-card img,.single-preview-card video{ width:100%;height:100%;max-width:100%;max-height:100%; object-fit:contain;display:block;background:#000; } .preview-overlay-actions{ position:absolute;top:12px;right:12px;display:flex;gap:8px;z-index:5; } .preview-action-btn{ display:inline-flex;align-items:center;justify-content:center; min-width:34px;height:34px;padding:0 12px;background:rgba(0,0,0,.65); border:1px solid rgba(255,255,255,.14);border-radius:10px;cursor:pointer; color:#fff!important;font-size:12px;font-weight:600;transition:all .15s ease; } .preview-action-btn:hover{background:#0000FF;border-color:#0000FF} .hint-bar{ background:rgba(0,0,255,.06);border-top:1px solid #1e2b52;border-bottom:1px solid #1e2b52; padding:10px 20px;font-size:13px;color:#a1a1aa;line-height:1.7; } .hint-bar b{color:#8ea2ff;font-weight:600} .hint-bar kbd{ display:inline-block;padding:1px 6px;background:#1b2646;border:1px solid #2d3b6d; border-radius:4px;font-family:'JetBrains Mono',monospace;font-size:11px;color:#a1a1aa; } .examples-section{border-top:1px solid #1e2b52;padding:12px 16px} .examples-title{ font-size:12px;font-weight:600;color:#71717a;text-transform:uppercase; letter-spacing:.8px;margin-bottom:10px; } .examples-scroll{display:flex;gap:10px;overflow-x:auto;padding-bottom:8px} .examples-scroll::-webkit-scrollbar{height:6px} .examples-scroll::-webkit-scrollbar-track{background:#08101d;border-radius:3px} .examples-scroll::-webkit-scrollbar-thumb{background:#243669;border-radius:3px} .examples-scroll::-webkit-scrollbar-thumb:hover{background:#38529a} .example-card{ flex-shrink:0;width:220px;background:#08101d;border:1px solid #1e2b52; border-radius:10px;overflow:hidden;cursor:pointer;transition:all .2s ease; } .example-card:hover{border-color:#0000FF;transform:translateY(-2px);box-shadow:0 4px 12px rgba(0,0,255,.15)} .example-card.loading{opacity:.5;pointer-events:none} .example-thumb-wrap{height:120px;overflow:hidden;background:#11182d} .example-thumb-wrap img{width:100%;height:100%;object-fit:cover} .example-thumb-placeholder{ width:100%;height:100%;display:flex;align-items:center;justify-content:center; background:#11182d;color:#3f4e78;font-size:11px; } .example-meta-row{padding:6px 10px;display:flex;align-items:center;gap:6px;flex-wrap:wrap} .example-badge{ display:inline-flex;padding:2px 7px;background:rgba(0,0,255,.12);border-radius:4px; font-size:10px;font-weight:600;color:#93a7ff;font-family:'JetBrains Mono',monospace;white-space:nowrap; } .example-badge.kind{background:rgba(100,130,255,.12);color:#bfd0ff} .example-badge.kind.video{background:rgba(0,90,255,.12);color:#a7c4ff} .example-prompt-text{ padding:0 10px 8px;font-size:11px;color:#a1a1aa;line-height:1.4; display:-webkit-box;-webkit-line-clamp:2;-webkit-box-orient:vertical;overflow:hidden; } .panel-card{border-bottom:1px solid #1e2b52} .panel-card-title{ padding:12px 20px;font-size:12px;font-weight:600;color:#71717a; text-transform:uppercase;letter-spacing:.8px;border-bottom:1px solid rgba(30,43,82,.6); } .panel-card-body{padding:16px 20px;display:flex;flex-direction:column;gap:8px} .modern-label{font-size:13px;font-weight:500;color:#a1a1aa;margin-bottom:4px;display:block} .modern-textarea{ width:100%;background:#08101d;border:1px solid #1e2b52;border-radius:8px; padding:10px 14px;font-family:'Inter',sans-serif;font-size:14px;color:#e4e4e7; resize:none;outline:none;min-height:100px;transition:border-color .2s; } .modern-textarea:focus{border-color:#0000FF;box-shadow:0 0 0 3px rgba(0,0,255,.15)} .modern-textarea::placeholder{color:#4e5d89} .modern-textarea.error-flash{ border-color:#ef4444!important;box-shadow:0 0 0 3px rgba(239,68,68,.2)!important;animation:shake .4s ease; } @keyframes shake{0%,100%{transform:translateX(0)}20%,60%{transform:translateX(-4px)}40%,80%{transform:translateX(4px)}} .toast-notification{ position:fixed;top:24px;left:50%;transform:translateX(-50%) translateY(-120%); z-index:9999;padding:10px 24px;border-radius:10px;font-family:'Inter',sans-serif; font-size:14px;font-weight:600;display:flex;align-items:center;gap:8px; box-shadow:0 8px 24px rgba(0,0,0,.5); transition:transform .35s cubic-bezier(.34,1.56,.64,1),opacity .35s ease;opacity:0;pointer-events:none; } .toast-notification.visible{transform:translateX(-50%) translateY(0);opacity:1;pointer-events:auto} .toast-notification.error{background:linear-gradient(135deg,#dc2626,#b91c1c);color:#fff;border:1px solid rgba(255,255,255,.15)} .toast-notification.warning{background:linear-gradient(135deg,#1d4ed8,#1e40af);color:#fff;border:1px solid rgba(255,255,255,.15)} .toast-notification.info{background:linear-gradient(135deg,#0000FF,#1d4ed8);color:#fff;border:1px solid rgba(255,255,255,.15)} .toast-notification .toast-icon{font-size:16px;line-height:1} .toast-notification .toast-text{line-height:1.3} .btn-run{ display:flex;align-items:center;justify-content:center;gap:8px;width:100%; background:linear-gradient(135deg,#0000FF,#1d4ed8);border:none;border-radius:10px; padding:12px 24px;cursor:pointer;font-size:15px;font-weight:600;font-family:'Inter',sans-serif; color:#ffffff!important;-webkit-text-fill-color:#ffffff!important; transition:all .2s ease;letter-spacing:-.2px; box-shadow:0 4px 16px rgba(0,0,255,.3),inset 0 1px 0 rgba(255,255,255,.1); } .btn-run:hover{ background:linear-gradient(135deg,#315cff,#0000FF);transform:translateY(-1px); box-shadow:0 6px 24px rgba(0,0,255,.45),inset 0 1px 0 rgba(255,255,255,.15); } .btn-run:active{transform:translateY(0);box-shadow:0 2px 8px rgba(0,0,255,.3)} #custom-run-btn,#custom-run-btn *,#run-btn-label,.btn-run,.btn-run *{ color:#ffffff!important;-webkit-text-fill-color:#ffffff!important;fill:#ffffff!important; } .output-frame{border-bottom:1px solid #1e2b52;display:flex;flex-direction:column;position:relative} .output-frame .out-title, .output-frame .out-title *, #output-title-label{ color:#ffffff!important; -webkit-text-fill-color:#ffffff!important; } .output-frame .out-title{ padding:10px 20px;font-size:13px;font-weight:700; text-transform:uppercase;letter-spacing:.8px;border-bottom:1px solid rgba(30,43,82,.6); display:flex;align-items:center;justify-content:space-between;gap:8px;flex-wrap:wrap; } .out-title-right{display:flex;gap:8px;align-items:center} .out-action-btn{ display:inline-flex;align-items:center;justify-content:center;background:rgba(0,0,255,.1); border:1px solid rgba(0,0,255,.2);border-radius:6px;cursor:pointer;padding:3px 10px; font-size:11px;font-weight:500;color:#93a7ff!important;gap:4px;height:24px;transition:all .15s; } .out-action-btn:hover{background:rgba(0,0,255,.2);border-color:rgba(0,0,255,.35);color:#ffffff!important} .out-action-btn svg{width:12px;height:12px;fill:#93a7ff} .output-frame .out-body{ flex:1;background:#08101d;display:flex;align-items:stretch;justify-content:stretch; overflow:hidden;min-height:320px;position:relative; } .output-scroll-wrap{ width:100%;height:100%;padding:0;overflow:hidden; } .output-textarea{ width:100%;height:320px;min-height:320px;max-height:320px;background:#08101d;color:#e4e4e7; border:none;outline:none;padding:16px 18px;font-size:13px;line-height:1.6; font-family:'JetBrains Mono',monospace;overflow:auto;resize:none;white-space:pre-wrap; } .output-textarea::placeholder{color:#5f6d96} .output-textarea.error-flash{ box-shadow:inset 0 0 0 2px rgba(239,68,68,.6); } .modern-loader{ display:none;position:absolute;top:0;left:0;right:0;bottom:0;background:rgba(8,16,29,.92); z-index:15;flex-direction:column;align-items:center;justify-content:center;gap:16px;backdrop-filter:blur(4px); } .modern-loader.active{display:flex} .modern-loader .loader-spinner{ width:36px;height:36px;border:3px solid #243669;border-top-color:#0000FF; border-radius:50%;animation:spin .8s linear infinite; } @keyframes spin{to{transform:rotate(360deg)}} .modern-loader .loader-text{font-size:13px;color:#a1a1aa;font-weight:500} .loader-bar-track{width:200px;height:4px;background:#243669;border-radius:2px;overflow:hidden} .loader-bar-fill{ height:100%;background:linear-gradient(90deg,#0000FF,#4b74ff,#0000FF); background-size:200% 100%;animation:shimmer 1.5s ease-in-out infinite;border-radius:2px; } @keyframes shimmer{0%{background-position:200% 0}100%{background-position:-200% 0}} .settings-group{border:1px solid #1e2b52;border-radius:10px;margin:12px 16px;padding:0;overflow:hidden} .settings-group-title{ font-size:12px;font-weight:600;color:#71717a;text-transform:uppercase;letter-spacing:.8px; padding:10px 16px;border-bottom:1px solid #1e2b52;background:rgba(17,24,45,.5); } .settings-group-body{padding:14px 16px;display:flex;flex-direction:column;gap:12px} .slider-row{display:flex;align-items:center;gap:10px;min-height:28px} .slider-row label{font-size:13px;font-weight:500;color:#a1a1aa;min-width:118px;flex-shrink:0} .slider-row input[type="range"]{ flex:1;-webkit-appearance:none;appearance:none;height:6px;background:#243669; border-radius:3px;outline:none;min-width:0; } .slider-row input[type="range"]::-webkit-slider-thumb{ -webkit-appearance:none;width:16px;height:16px;background:linear-gradient(135deg,#0000FF,#1d4ed8); border-radius:50%;cursor:pointer;box-shadow:0 2px 6px rgba(0,0,255,.4);transition:transform .15s; } .slider-row input[type="range"]::-webkit-slider-thumb:hover{transform:scale(1.2)} .slider-row input[type="range"]::-moz-range-thumb{ width:16px;height:16px;background:linear-gradient(135deg,#0000FF,#1d4ed8); border-radius:50%;cursor:pointer;border:none;box-shadow:0 2px 6px rgba(0,0,255,.4); } .slider-row .slider-val{ min-width:58px;text-align:right;font-family:'JetBrains Mono',monospace;font-size:12px; font-weight:500;padding:3px 8px;background:#08101d;border:1px solid #1e2b52; border-radius:6px;color:#a1a1aa;flex-shrink:0; } .app-statusbar{ background:#11182d;border-top:1px solid #1e2b52;padding:6px 20px; display:flex;gap:12px;height:34px;align-items:center;font-size:12px; } .app-statusbar .sb-section{ padding:0 12px;flex:1;display:flex;align-items:center;font-family:'JetBrains Mono',monospace; font-size:12px;color:#6b7cae;overflow:hidden;white-space:nowrap; } .app-statusbar .sb-section.sb-fixed{ flex:0 0 auto;min-width:110px;text-align:center;justify-content:center; padding:3px 12px;background:rgba(0,0,255,.08);border-radius:6px;color:#93a7ff;font-weight:500; } .exp-note{padding:10px 20px;font-size:12px;color:#6b7cae;border-top:1px solid #1e2b52;text-align:center} .exp-note a{color:#93a7ff;text-decoration:none} .exp-note a:hover{text-decoration:underline} ::-webkit-scrollbar{width:8px;height:8px} ::-webkit-scrollbar-track{background:#08101d} ::-webkit-scrollbar-thumb{background:#243669;border-radius:4px} ::-webkit-scrollbar-thumb:hover{background:#38529a} @media(max-width:980px){ .app-main-row{flex-direction:column} .app-main-right{width:100%} .app-main-left{border-right:none;border-bottom:1px solid #1e2b52} } """ gallery_js = r""" () => { function init() { if (window.__visionScopeInitDone) return; const dropZone = document.getElementById('media-drop-zone'); const uploadPrompt = document.getElementById('upload-prompt'); const uploadClick = document.getElementById('upload-click-area'); const fileInput = document.getElementById('custom-file-input'); const previewWrap = document.getElementById('single-preview-wrap'); const previewImg = document.getElementById('single-preview-img'); const previewVideo = document.getElementById('single-preview-video'); const btnUpload = document.getElementById('preview-upload-btn'); const btnClear = document.getElementById('preview-clear-btn'); const promptInput = document.getElementById('custom-query-input'); const runBtnEl = document.getElementById('custom-run-btn'); const outputArea = document.getElementById('custom-output-textarea'); const mediaStatus = document.getElementById('sb-media-status'); const exampleResultContainer = document.getElementById('example-result-data'); if (!dropZone || !fileInput || !promptInput || !previewWrap || !previewImg || !previewVideo) { setTimeout(init, 250); return; } window.__visionScopeInitDone = true; let fileState = null; let toastTimer = null; function showToast(message, type) { let toast = document.getElementById('app-toast'); if (!toast) { toast = document.createElement('div'); toast.id = 'app-toast'; toast.className = 'toast-notification'; toast.innerHTML = ''; document.body.appendChild(toast); } const icon = toast.querySelector('.toast-icon'); const text = toast.querySelector('.toast-text'); toast.className = 'toast-notification ' + (type || 'error'); if (type === 'warning') icon.textContent = '\u26A0'; else if (type === 'info') icon.textContent = '\u2139'; else icon.textContent = '\u2717'; text.textContent = message; if (toastTimer) clearTimeout(toastTimer); void toast.offsetWidth; toast.classList.add('visible'); toastTimer = setTimeout(() => toast.classList.remove('visible'), 3500); } window.__showToast = showToast; function showLoader() { const l = document.getElementById('output-loader'); if (l) l.classList.add('active'); const sb = document.getElementById('sb-run-state'); if (sb) sb.textContent = 'Processing...'; } function hideLoader() { const l = document.getElementById('output-loader'); if (l) l.classList.remove('active'); const sb = document.getElementById('sb-run-state'); if (sb) sb.textContent = 'Done'; } window.__showLoader = showLoader; window.__hideLoader = hideLoader; function flashPromptError() { promptInput.classList.add('error-flash'); promptInput.focus(); setTimeout(() => promptInput.classList.remove('error-flash'), 800); } function flashOutputError() { if (!outputArea) return; outputArea.classList.add('error-flash'); setTimeout(() => outputArea.classList.remove('error-flash'), 800); } function setGradioValue(containerId, value) { const container = document.getElementById(containerId); if (!container) return; container.querySelectorAll('input, textarea').forEach(el => { if (el.type === 'file' || el.type === 'range' || el.type === 'checkbox') return; const proto = el.tagName === 'TEXTAREA' ? HTMLTextAreaElement.prototype : HTMLInputElement.prototype; const ns = Object.getOwnPropertyDescriptor(proto, 'value'); if (ns && ns.set) { ns.set.call(el, value); el.dispatchEvent(new Event('input', {bubbles:true, composed:true})); el.dispatchEvent(new Event('change', {bubbles:true, composed:true})); } }); } function syncFileToGradio() { setGradioValue('hidden-file-b64', fileState ? fileState.b64 : ''); setGradioValue('hidden-input-kind', fileState ? fileState.kind : getActiveMode()); const txt = fileState ? ('1 ' + fileState.kind + ' uploaded') : ('No ' + getActiveMode() + ' uploaded'); if (mediaStatus) mediaStatus.textContent = txt; } function syncPromptToGradio() { setGradioValue('prompt-gradio-input', promptInput.value); } function syncModelToGradio(name) { setGradioValue('hidden-model-name', name); } function syncModeToGradio(mode) { setGradioValue('hidden-mode-name', mode); setGradioValue('hidden-input-kind', fileState ? fileState.kind : mode); const sub = document.getElementById('upload-sub-text'); const main = document.getElementById('upload-main-text'); if (mode === 'video') { if (main) main.textContent = 'Click or drag a video here'; if (sub) sub.textContent = 'Upload one short video clip for multimodal video understanding'; } else { if (main) main.textContent = 'Click or drag an image here'; if (sub) sub.textContent = 'Upload one document, page, receipt, screenshot, or scene image for OCR and vision tasks'; } if (!fileState && mediaStatus) mediaStatus.textContent = 'No ' + mode + ' uploaded'; } function getActiveMode() { const active = document.querySelector('.mode-tab.active'); return active ? active.getAttribute('data-mode') : 'image'; } function activateModeTab(name) { document.querySelectorAll('.mode-tab[data-mode]').forEach(btn => { btn.classList.toggle('active', btn.getAttribute('data-mode') === name); }); syncModeToGradio(name); if (fileState && fileState.kind !== name) { clearPreview(); } } window.__activateModeTab = activateModeTab; function activateModelTab(name) { document.querySelectorAll('.model-tab[data-model]').forEach(btn => { btn.classList.toggle('active', btn.getAttribute('data-model') === name); }); syncModelToGradio(name); } window.__activateModelTab = activateModelTab; function setPreview(kind, b64, name) { fileState = {kind, b64, name: name || kind}; if (kind === 'video') { previewVideo.src = b64; previewVideo.style.display = 'block'; previewImg.style.display = 'none'; previewVideo.load(); } else { previewImg.src = b64; previewImg.style.display = 'block'; previewVideo.style.display = 'none'; previewVideo.removeAttribute('src'); } previewWrap.style.display = 'flex'; if (uploadPrompt) uploadPrompt.style.display = 'none'; activateModeTab(kind); syncFileToGradio(); } window.__setPreview = setPreview; function clearPreview() { fileState = null; previewImg.src = ''; previewVideo.pause(); previewVideo.removeAttribute('src'); previewVideo.load(); previewWrap.style.display = 'none'; if (uploadPrompt) uploadPrompt.style.display = 'flex'; syncFileToGradio(); } window.__clearPreview = clearPreview; function processFile(file) { if (!file) return; const mode = getActiveMode(); if (mode === 'image' && !file.type.startsWith('image/')) { showToast('Only image files are supported in Image mode', 'error'); return; } if (mode === 'video' && !file.type.startsWith('video/')) { showToast('Only video files are supported in Video mode', 'error'); return; } const reader = new FileReader(); reader.onload = (e) => setPreview(mode, e.target.result, file.name); reader.readAsDataURL(file); } fileInput.addEventListener('change', (e) => { const file = e.target.files && e.target.files[0] ? e.target.files[0] : null; if (file) processFile(file); e.target.value = ''; }); if (uploadClick) uploadClick.addEventListener('click', () => fileInput.click()); if (btnUpload) btnUpload.addEventListener('click', () => fileInput.click()); if (btnClear) btnClear.addEventListener('click', clearPreview); dropZone.addEventListener('dragover', (e) => { e.preventDefault(); dropZone.classList.add('drag-over'); }); dropZone.addEventListener('dragleave', (e) => { e.preventDefault(); dropZone.classList.remove('drag-over'); }); dropZone.addEventListener('drop', (e) => { e.preventDefault(); dropZone.classList.remove('drag-over'); if (e.dataTransfer.files && e.dataTransfer.files.length) processFile(e.dataTransfer.files[0]); }); promptInput.addEventListener('input', syncPromptToGradio); document.querySelectorAll('.model-tab[data-model]').forEach(btn => { btn.addEventListener('click', () => { const model = btn.getAttribute('data-model'); activateModelTab(model); }); }); document.querySelectorAll('.mode-tab[data-mode]').forEach(btn => { btn.addEventListener('click', () => { const mode = btn.getAttribute('data-mode'); activateModeTab(mode); }); }); activateModelTab('DeepCaption-VLA-7B'); activateModeTab('image'); function syncSlider(customId, gradioId) { const slider = document.getElementById(customId); const valSpan = document.getElementById(customId + '-val'); if (!slider) return; slider.addEventListener('input', () => { if (valSpan) valSpan.textContent = slider.value; const container = document.getElementById(gradioId); if (!container) return; container.querySelectorAll('input[type="range"],input[type="number"]').forEach(el => { const ns = Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value'); if (ns && ns.set) { ns.set.call(el, slider.value); el.dispatchEvent(new Event('input', {bubbles:true, composed:true})); el.dispatchEvent(new Event('change', {bubbles:true, composed:true})); } }); }); } syncSlider('custom-max-new-tokens', 'gradio-max-new-tokens'); syncSlider('custom-temperature', 'gradio-temperature'); syncSlider('custom-top-p', 'gradio-top-p'); syncSlider('custom-top-k', 'gradio-top-k'); syncSlider('custom-repetition-penalty', 'gradio-repetition-penalty'); syncSlider('custom-gpu-duration', 'gradio-gpu-duration'); function validateBeforeRun() { const promptVal = promptInput.value.trim(); const currentMode = getActiveMode(); if (!fileState && !promptVal) { showToast('Please upload a file and enter your instruction', 'error'); flashPromptError(); return false; } if (!fileState) { showToast('Please upload a ' + currentMode, 'error'); return false; } if (!promptVal) { showToast('Please enter your vision/query instruction', 'warning'); flashPromptError(); return false; } if (fileState.kind !== currentMode) { showToast('Uploaded file type does not match active mode', 'error'); return false; } const currentModel = (document.querySelector('.model-tab.active') || {}).dataset?.model; if (!currentModel) { showToast('Please select a model', 'error'); return false; } return true; } window.__clickGradioRunBtn = function() { if (!validateBeforeRun()) return; syncPromptToGradio(); syncFileToGradio(); const activeModel = document.querySelector('.model-tab.active'); if (activeModel) syncModelToGradio(activeModel.getAttribute('data-model')); syncModeToGradio(getActiveMode()); if (outputArea) outputArea.value = ''; showLoader(); setTimeout(() => { const gradioBtn = document.getElementById('gradio-run-btn'); if (!gradioBtn) return; const btn = gradioBtn.querySelector('button'); if (btn) btn.click(); else gradioBtn.click(); }, 180); }; if (runBtnEl) runBtnEl.addEventListener('click', () => window.__clickGradioRunBtn()); const copyBtn = document.getElementById('copy-output-btn'); if (copyBtn) { copyBtn.addEventListener('click', async () => { try { const text = outputArea ? outputArea.value : ''; if (!text.trim()) { showToast('No output to copy', 'warning'); return; } await navigator.clipboard.writeText(text); showToast('Output copied to clipboard', 'info'); } catch(e) { showToast('Copy failed', 'error'); } }); } const saveBtn = document.getElementById('save-output-btn'); if (saveBtn) { saveBtn.addEventListener('click', () => { const text = outputArea ? outputArea.value : ''; if (!text.trim()) { showToast('No output to save', 'warning'); flashOutputError(); return; } const blob = new Blob([text], {type: 'text/plain;charset=utf-8'}); const a = document.createElement('a'); a.href = URL.createObjectURL(blob); a.download = 'visionscope_r2_output.txt'; document.body.appendChild(a); a.click(); setTimeout(() => { URL.revokeObjectURL(a.href); document.body.removeChild(a); }, 200); showToast('Output saved', 'info'); }); } document.querySelectorAll('.example-card[data-idx]').forEach(card => { card.addEventListener('click', () => { const idx = card.getAttribute('data-idx'); const kind = card.getAttribute('data-kind'); document.querySelectorAll('.example-card.loading').forEach(c => c.classList.remove('loading')); card.classList.add('loading'); showToast('Loading example...', 'info'); setGradioValue('example-result-data', ''); setGradioValue('example-kind-input', kind); setGradioValue('example-idx-input', idx); setTimeout(() => { const btn = document.getElementById('example-load-btn'); if (btn) { const b = btn.querySelector('button'); if (b) b.click(); else btn.click(); } }, 150); setTimeout(() => card.classList.remove('loading'), 12000); }); }); function checkExampleResult() { if (!exampleResultContainer) return; const el = exampleResultContainer.querySelector('textarea') || exampleResultContainer.querySelector('input'); if (!el || !el.value) return; if (window.__lastExampleVal === el.value) return; try { const data = JSON.parse(el.value); if (data.status === 'ok') { window.__lastExampleVal = el.value; if (data.file && data.kind) setPreview(data.kind, data.file, data.name || 'example'); if (data.query) { promptInput.value = data.query; syncPromptToGradio(); } if (data.model) activateModelTab(data.model); if (data.kind) activateModeTab(data.kind); document.querySelectorAll('.example-card.loading').forEach(c => c.classList.remove('loading')); showToast('Example loaded', 'info'); } else if (data.status === 'error') { document.querySelectorAll('.example-card.loading').forEach(c => c.classList.remove('loading')); showToast(data.message || 'Failed to load example', 'error'); } } catch(e) {} } const obsExample = new MutationObserver(checkExampleResult); if (exampleResultContainer) { obsExample.observe(exampleResultContainer, {childList:true, subtree:true, characterData:true, attributes:true}); } setInterval(checkExampleResult, 500); if (outputArea) outputArea.value = ''; const sb = document.getElementById('sb-run-state'); if (sb) sb.textContent = 'Ready'; if (mediaStatus) mediaStatus.textContent = 'No image uploaded'; } init(); } """ wire_outputs_js = r""" () => { function watchOutputs() { const resultContainer = document.getElementById('gradio-result'); const outArea = document.getElementById('custom-output-textarea'); if (!resultContainer || !outArea) { setTimeout(watchOutputs, 500); return; } let lastText = ''; function syncOutput() { const el = resultContainer.querySelector('textarea') || resultContainer.querySelector('input'); if (!el) return; const val = el.value || ''; if (val !== lastText) { lastText = val; outArea.value = val; outArea.scrollTop = outArea.scrollHeight; if (window.__hideLoader && val.trim()) window.__hideLoader(); } } const observer = new MutationObserver(syncOutput); observer.observe(resultContainer, {childList:true, subtree:true, characterData:true, attributes:true}); setInterval(syncOutput, 500); } watchOutputs(); } """ APP_LOGO_SVG = """ """ UPLOAD_PREVIEW_SVG = """ """ COPY_SVG = """""" SAVE_SVG = """""" MODEL_TABS_HTML = "".join([ f'' for m in MODEL_CHOICES ]) MODE_TABS_HTML = """ """ with gr.Blocks() as demo: hidden_file_b64 = gr.Textbox(value="", elem_id="hidden-file-b64", elem_classes="hidden-input", container=False) hidden_mode_name = gr.Textbox(value="image", elem_id="hidden-mode-name", elem_classes="hidden-input", container=False) hidden_input_kind = gr.Textbox(value="image", elem_id="hidden-input-kind", elem_classes="hidden-input", container=False) prompt = gr.Textbox(value="", elem_id="prompt-gradio-input", elem_classes="hidden-input", container=False) hidden_model_name = gr.Textbox(value="DeepCaption-VLA-7B", elem_id="hidden-model-name", elem_classes="hidden-input", container=False) max_new_tokens = gr.Slider(minimum=1, maximum=MAX_MAX_NEW_TOKENS, step=1, value=DEFAULT_MAX_NEW_TOKENS, elem_id="gradio-max-new-tokens", elem_classes="hidden-input", container=False) temperature = gr.Slider(minimum=0.1, maximum=4.0, step=0.1, value=0.6, elem_id="gradio-temperature", elem_classes="hidden-input", container=False) top_p = gr.Slider(minimum=0.05, maximum=1.0, step=0.05, value=0.9, elem_id="gradio-top-p", elem_classes="hidden-input", container=False) top_k = gr.Slider(minimum=1, maximum=1000, step=1, value=50, elem_id="gradio-top-k", elem_classes="hidden-input", container=False) repetition_penalty = gr.Slider(minimum=1.0, maximum=2.0, step=0.05, value=1.2, elem_id="gradio-repetition-penalty", elem_classes="hidden-input", container=False) gpu_duration_state = gr.Number(value=60, elem_id="gradio-gpu-duration", elem_classes="hidden-input", container=False) result = gr.Textbox(value="", elem_id="gradio-result", elem_classes="hidden-input", container=False) example_kind = gr.Textbox(value="", elem_id="example-kind-input", elem_classes="hidden-input", container=False) example_idx = gr.Textbox(value="", elem_id="example-idx-input", elem_classes="hidden-input", container=False) example_result = gr.Textbox(value="", elem_id="example-result-data", elem_classes="hidden-input", container=False) example_load_btn = gr.Button("Load Example", elem_id="example-load-btn") gr.HTML(f"""
VisionScope R2 vision enabled Blue Suite
{MODE_TABS_HTML}
{MODEL_TABS_HTML}
{UPLOAD_PREVIEW_SVG} Click or drag an image here Upload one document, page, receipt, screenshot, or scene image for OCR and vision tasks
Upload: Click or drag to add an image or video  ·  Mode: Switch between image and video inference  ·  Model: Choose model tabs from the header
Quick Examples
{EXAMPLE_CARDS_HTML}
Vision Instruction
Raw Output Stream
Running vision inference...
Advanced Settings
{DEFAULT_MAX_NEW_TOKENS}
0.6
0.9
50
1.2
60
Experimental Vision Suite · Open on GitHub
No image uploaded
Ready
""") run_btn = gr.Button("Run", elem_id="gradio-run-btn") def b64_to_pil(b64_str): if not b64_str: return None try: if b64_str.startswith("data:image"): _, data = b64_str.split(",", 1) else: data = b64_str image_data = base64.b64decode(data) return Image.open(BytesIO(image_data)).convert("RGB") except Exception: return None def b64_to_temp_video(b64_str): if not b64_str: return None try: header, data = b64_str.split(",", 1) if "," in b64_str else ("", b64_str) os.makedirs("/tmp/visionscope_r2", exist_ok=True) ext = ".mp4" if "video/webm" in header: ext = ".webm" elif "video/quicktime" in header: ext = ".mov" elif "video/x-matroska" in header: ext = ".mkv" path = f"/tmp/visionscope_r2/{uuid.uuid4().hex}{ext}" with open(path, "wb") as f: f.write(base64.b64decode(data)) return path except Exception: return None def run_vision(mode_name, model_name, text, file_b64, max_new_tokens_v, temperature_v, top_p_v, top_k_v, repetition_penalty_v, gpu_timeout_v): if mode_name == "video": temp_video_path = b64_to_temp_video(file_b64) if not temp_video_path: raise gr.Error("Failed to decode uploaded video.") try: yield from generate_video( model_name=model_name, text=text, video_path=temp_video_path, max_new_tokens=max_new_tokens_v, temperature=temperature_v, top_p=top_p_v, top_k=top_k_v, repetition_penalty=repetition_penalty_v, gpu_timeout=gpu_timeout_v, ) finally: try: if os.path.exists(temp_video_path): os.remove(temp_video_path) except Exception: pass else: image = b64_to_pil(file_b64) yield from generate_image( model_name=model_name, text=text, image=image, max_new_tokens=max_new_tokens_v, temperature=temperature_v, top_p=top_p_v, top_k=top_k_v, repetition_penalty=repetition_penalty_v, gpu_timeout=gpu_timeout_v, ) demo.load(fn=noop, inputs=None, outputs=None, js=gallery_js) demo.load(fn=noop, inputs=None, outputs=None, js=wire_outputs_js) run_btn.click( fn=run_vision, inputs=[ hidden_mode_name, hidden_model_name, prompt, hidden_file_b64, max_new_tokens, temperature, top_p, top_k, repetition_penalty, gpu_duration_state, ], outputs=[result], js=r"""(mode, model, p, filev, mnt, t, tp, tk, rp, gd) => { const modelEl = document.querySelector('.model-tab.active'); const modeEl = document.querySelector('.mode-tab.active'); const chosenModel = modelEl ? modelEl.getAttribute('data-model') : model; const chosenMode = modeEl ? modeEl.getAttribute('data-mode') : mode; const promptEl = document.getElementById('custom-query-input'); const promptVal = promptEl ? promptEl.value : p; const fileContainer = document.getElementById('hidden-file-b64'); let fileVal = filev; if (fileContainer) { const inner = fileContainer.querySelector('textarea, input'); if (inner) fileVal = inner.value; } return [chosenMode, chosenModel, promptVal, fileVal, mnt, t, tp, tk, rp, gd]; }""", ) example_load_btn.click( fn=load_example_data, inputs=[example_kind, example_idx], outputs=[example_result], queue=False, ) if __name__ == "__main__": demo.queue(max_size=50).launch( css=css, mcp_server=True, ssr_mode=False, show_error=True, allowed_paths=["images", "videos"], )