Spaces:

ricklon
/

DeepSeek-OCR-2-Math

Running on Zero

App Files Files Community

ricklon commited on 12 days ago

Commit

152c5bd

1 Parent(s): c0f56fe

Improve faculty-facing region selection guidance in UI

Browse files

Files changed (1) hide show

app.py +39 -6

app.py CHANGED Viewed

@@ -915,19 +915,29 @@ with gr.Blocks(title="DeepSeek-OCR-2") as demo:
             input_img = gr.Image(label="Input Image", type="pil", height=300)
             page_selector = gr.Number(label="Select Page", value=1, minimum=1, step=1, visible=False)
             task = gr.Dropdown(list(TASK_PROMPTS.keys()), value="📋 Markdown", label="Task")
             equation_zoom = gr.Checkbox(label="Equation Zoom (multipass)", value=False)
             prompt = gr.Textbox(label="Prompt", lines=2, visible=False)
             btn = gr.Button("Extract", variant="primary", size="lg")
-            with gr.Accordion("Region OCR (Draw/Crop)", open=False):
                 if HAS_IMAGE_EDITOR:
                     region_editor = gr.ImageEditor(
-                        label="Draw a box and crop to the target area, then click OCR Region",
                         type="pil",
                         height=300,
                     )
                     region_btn = gr.Button("OCR Region", variant="secondary")
                 else:
                     gr.Markdown("Region drawing requires a newer Gradio version with `ImageEditor` support.")
         with gr.Column(scale=2):
             with gr.Tabs() as tabs:
@@ -977,13 +987,22 @@ with gr.Blocks(title="DeepSeek-OCR-2") as demo:
         ### Configuration
         1024 base + 768 patches with dynamic cropping (2-6 patches). 144 tokens per patch + 256 base tokens.
         ### Tasks
         - **Markdown**: Convert document to structured markdown with layout detection (grounding ✅)
         - **Free OCR**: Read all visible text from the full page/image (no boxes, no targeting)
         - **Locate**: Find and highlight where specific text appears (grounding ✅)
         - **Describe**: General image description
         - **Custom**: Your own prompt
-        - **Region OCR (new)**: In the left panel, open **Region OCR (Draw/Crop)**, draw/crop a target area, then click **OCR Region**
         - **Equation Zoom (multipass)**: Optional nested equation refinement for Markdown. Off by default for speed/stability.
         ### Free OCR vs Locate (important)
@@ -1014,8 +1033,22 @@ with gr.Blocks(title="DeepSeek-OCR-2") as demo:
             [text_out, md_out, html_out, html_source_out, spatial_out, spatial_source_out, raw_out, img_out, gallery, download_btn, region_text_out, region_html_out]
         )
-    def run(image, file_path, task, custom_prompt, page_num, enable_equation_zoom):
-        if file_path:
             cleaned, markdown, raw, img_out, crops = process_file(
                 file_path,
                 task,
@@ -1038,7 +1071,7 @@ with gr.Blocks(title="DeepSeek-OCR-2") as demo:
     submit_event = btn.click(
         run,
-        [input_img, file_in, task, prompt, page_selector, equation_zoom],
         [text_out, md_out, html_out, html_source_out, spatial_out, spatial_source_out, raw_out, img_out, gallery, download_btn, region_text_out, region_html_out]
     )
     submit_event.then(select_boxes, [task], [tabs])

             input_img = gr.Image(label="Input Image", type="pil", height=300)
             page_selector = gr.Number(label="Select Page", value=1, minimum=1, step=1, visible=False)
             task = gr.Dropdown(list(TASK_PROMPTS.keys()), value="📋 Markdown", label="Task")
+            input_scope = gr.Radio(["Entire Page", "Selected Region"], value="Entire Page", label="Input Scope")
             equation_zoom = gr.Checkbox(label="Equation Zoom (multipass)", value=False)
+            gr.Markdown(
+                """
+                **Quick use**
+                1. `Entire Page`: click **Extract**.
+                2. `Selected Region`: open **Region Selector**, draw a box around the target (no painting), crop, then click **Extract**.
+                3. Check **Cropped Images** to confirm the selected region used for OCR.
+                """
+            )
             prompt = gr.Textbox(label="Prompt", lines=2, visible=False)
             btn = gr.Button("Extract", variant="primary", size="lg")
+            with gr.Accordion("Region Selector (Draw/Crop)", open=False):
                 if HAS_IMAGE_EDITOR:
                     region_editor = gr.ImageEditor(
+                        label="Draw a rectangle around what you want (do not paint/fill), crop, then run Extract with Input Scope=Selected Region.",
                         type="pil",
                         height=300,
                     )
                     region_btn = gr.Button("OCR Region", variant="secondary")
                 else:
                     gr.Markdown("Region drawing requires a newer Gradio version with `ImageEditor` support.")
+                    region_editor = gr.State(None)
         with gr.Column(scale=2):
             with gr.Tabs() as tabs:
         ### Configuration
         1024 base + 768 patches with dynamic cropping (2-6 patches). 144 tokens per patch + 256 base tokens.
+        ### Faculty Quick Workflow
+        1. Choose a task (`Markdown`, `Free OCR`, or `Locate`).
+        2. Choose **Input Scope**:
+           - `Entire Page` for the full page.
+           - `Selected Region` for a specific area.
+        3. For `Selected Region`, open **Region Selector (Draw/Crop)**, draw a box around the target (no painting/fill), crop, then click **Extract**.
+        4. Review **Cropped Images** to confirm the selected region used for OCR.
         ### Tasks
         - **Markdown**: Convert document to structured markdown with layout detection (grounding ✅)
         - **Free OCR**: Read all visible text from the full page/image (no boxes, no targeting)
         - **Locate**: Find and highlight where specific text appears (grounding ✅)
         - **Describe**: General image description
         - **Custom**: Your own prompt
+        - **Region OCR (new)**: In the left panel, open **Region Selector (Draw/Crop)**, draw/crop a target area, then click **OCR Region** (or set Input Scope to Selected Region and click Extract)
+        - **Input Scope**: `Entire Page` or `Selected Region` (Selected Region uses the Region Selector crop as main input)
         - **Equation Zoom (multipass)**: Optional nested equation refinement for Markdown. Off by default for speed/stability.
         ### Free OCR vs Locate (important)
             [text_out, md_out, html_out, html_source_out, spatial_out, spatial_source_out, raw_out, img_out, gallery, download_btn, region_text_out, region_html_out]
         )
+    def run(image, file_path, task, custom_prompt, page_num, enable_equation_zoom, scope, region_value):
+        selected_region = None
+        if scope == "Selected Region":
+            selected_region = _extract_editor_image(region_value)
+            if selected_region is None:
+                msg = "Select Input Scope=Selected Region, then draw/crop in Region Selector first."
+                return (msg, "", "", "", "", "", "", None, [], gr.DownloadButton(visible=False), msg, "")
+            cleaned, markdown, raw, img_out, crops = process_image(
+                selected_region,
+                task,
+                custom_prompt,
+                enable_equation_zoom=enable_equation_zoom,
+                infer_crop_mode=False,
+            )
+            crops = [selected_region] + (crops or [])
+        elif file_path:
             cleaned, markdown, raw, img_out, crops = process_file(
                 file_path,
                 task,
     submit_event = btn.click(
         run,
+        [input_img, file_in, task, prompt, page_selector, equation_zoom, input_scope, region_editor],
         [text_out, md_out, html_out, html_source_out, spatial_out, spatial_source_out, raw_out, img_out, gallery, download_btn, region_text_out, region_html_out]
     )
     submit_event.then(select_boxes, [task], [tabs])