Spaces:

sharifIslam
/

MV3DR

Sleeping

sharifIslam commited on Feb 1

Commit

ed72314

1 Parent(s): e3d6629

Add core functionality and project structure for MV3DR application

- Implement main application logic in main.py
- Set up configuration management in config.py
- Create preprocessing, inference, postprocessing, and visualization modules
- Add Gradio UI for user interaction
- Include license information and README documentation
- Add certificate and update requirements for dependencies

Files changed (18) hide show

.gradio/certificate.pem +31 -0
LICENSE +23 -0
README.md +78 -0
assets/fullscreen.js +14 -7
assets/style.css +63 -11
config.py +21 -0
core/__init__.py +12 -0
core/inference.py +26 -0
core/postprocessing.py +48 -0
core/preprocessing.py +25 -0
core/visualization.py +24 -0
dust3r +1 -0
gradio_ui.py +35 -37
main.py +32 -0
model.py +13 -11
pipeline.py +53 -47
requirement.txt → requirements.txt +1 -0
setup.sh +4 -2

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

LICENSE ADDED Viewed

	@@ -0,0 +1,23 @@

+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
+This work is licensed under CC BY-NC-SA 4.0
+(Attribution-NonCommercial-ShareAlike 4.0 International)
+You are free to:
+- Share: copy and redistribute the material in any medium or format
+- Adapt: remix, transform, and build upon the material
+Under the following terms:
+- Attribution: You must give appropriate credit, provide a link to the license,
+  and indicate if changes were made.
+- NonCommercial: You may not use the material for commercial purposes.
+- ShareAlike: If you remix, transform, or build upon the material, you must
+  distribute your contributions under the same license as the original.
+Full license: https://creativecommons.org/licenses/by-nc-sa/4.0/
+---
+This project is built upon DUSt3R by Naver Corporation.
+DUSt3R is licensed under CC BY-NC-SA 4.0.
+Copyright (C) 2024-present Naver Corporation.

README.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# Multi-View 3D Reconstruction (MV3DR)
+A web application for multi-view 3D object reconstruction using DUSt3R (Dense and Unconstrained Stereo 3D Reconstruction).
+## Overview
+MV3DR is a production-ready system that generates 3D models from multiple 2D images through dense stereo reconstruction. The application features a dark-themed web interface built with Gradio, providing real-time visualization of depth maps, confidence heatmaps, and interactive 3D outputs.
+## Features
+- Multi-view stereo reconstruction from 2+ images
+- Dual export modes: point cloud or textured mesh
+- Real-time depth map and confidence visualization
+- Advanced post-processing with background filtering
+- Point cloud cleaning and alignment optimization
+- Interactive 3D viewer with fullscreen support
+- Modular architecture with separated core modules
+## Technical Stack
+- **Model**: DUSt3R (Dense and Unconstrained Stereo 3D Reconstruction)
+- **Framework**: PyTorch with mixed-precision inference
+- **Interface**: Gradio with custom dark theme
+- **3D Processing**: Trimesh for mesh/point cloud generation
+- **Visualization**: Matplotlib for heatmaps
+## Architecture
+```
+MV3DR/
+├── main.py              # Application entry point
+├── config.py            # Centralized configuration
+├── model.py             # DUSt3R model initialization
+├── pipeline.py          # Main reconstruction pipeline
+├── gradio_ui.py         # Web interface
+├── core/
+│   ├── inference.py     # Model inference with AMP
+│   ├── preprocessing.py # Image enhancement
+│   ├── postprocessing.py# 3D output generation
+│   └── visualization.py # Artifact generation
+└── assets/
+    ├── style.css        # Dark theme styling
+    └── fullscreen.js    # 3D viewer controls
+```
+## Pipeline
+1. **Preprocessing**: Contrast and sharpness enhancement
+2. **Pair Generation**: Create symmetrized image pairs
+3. **Inference**: DUSt3R stereo reconstruction with mixed precision
+4. **Alignment**: Multi-view point cloud optimization
+5. **Post-processing**: Depth cleaning and background filtering
+6. **Export**: GLB file generation (mesh or point cloud)
+7. **Visualization**: Depth maps and confidence heatmaps
+## Performance
+- RTX 4090: ~15-20 seconds (3 images)
+- RTX 3080: ~25-30 seconds (3 images)
+- T4 GPU: ~40-50 seconds (3 images)
+- CPU: ~5-10 minutes (3 images)
+## Requirements
+- Python 3.10+
+- CUDA-capable GPU (16GB+ VRAM recommended)
+- 16GB+ RAM
+## License
+Licensed under CC BY-NC-SA 4.0 (non-commercial use only).
+Built on DUSt3R by Naver Corporation.
+## Credits
+- DUSt3R: Naver Corporation
+- Gradio: Hugging Face
+- Trimesh: Michael Dawson-Haggerty

assets/fullscreen.js CHANGED Viewed

@@ -1,12 +1,19 @@
 () => {
     const el = document.getElementById("model-container");
-    if (!el) return;
-    if (el.requestFullscreen) {
-        el.requestFullscreen();
-    } else if (el.webkitRequestFullscreen) {
-        el.webkitRequestFullscreen();
-    } else if (el.msRequestFullscreen) {
-        el.msRequestFullscreen();
     }
 }

 () => {
     const el = document.getElementById("model-container");
+    if (!el) {
+        console.error("Model container not found");
+        return;
+    }
+    if (document.fullscreenElement) {
+        document.exitFullscreen();
+    } else {
+        if (el.requestFullscreen) {
+            el.requestFullscreen();
+        } else if (el.webkitRequestFullscreen) {
+            el.webkitRequestFullscreen();
+        } else if (el.msRequestFullscreen) {
+            el.msRequestFullscreen();
+        }
     }
 }

assets/style.css CHANGED Viewed

@@ -3,19 +3,48 @@ footer {display: none !important;}
 :root {
     --primary-500: #FFFFFF !important;
-    --body-background-fill: #000000 !important;
-    --block-background-fill: #000000 !important;
-    --input-background-fill: #000000 !important;
-    --border-color-primary: #333333 !important;
-    --background-fill-secondary: #000000 !important;
 }
 .gradio-container {
-    background-color: #000000 !important;
     color: #FFFFFF !important;
     font-family: 'Inter', system-ui, sans-serif !important;
 }
 button.primary {
     background-color: #FFFFFF !important;
     color: #000000 !important;
@@ -23,25 +52,44 @@ button.primary {
     font-weight: 600 !important;
     text-transform: uppercase;
     letter-spacing: 1px;
 }
 button.primary:hover {
     background-color: #B2B2B2 !important;
 }
-.label {
-    color: #808080 !important;
-    text-transform: uppercase;
-    font-size: 11px !important;
 }
 #model-container:fullscreen {
-    background-color: black;
     width: 100vw;
     height: 100vh;
 }
 .generating::after {
     color: #808080;
     font-size: 10px;
     letter-spacing: 2px;
@@ -51,3 +99,7 @@ button.primary:hover {
 @keyframes blink {
     50% { opacity: 0; }
 }

 :root {
     --primary-500: #FFFFFF !important;
+    --body-background-fill: #0a0a0a !important;
+    --block-background-fill: #0a0a0a !important;
+    --input-background-fill: #0a0a0a !important;
+    --border-color-primary: #1a1a1a !important;
+    --background-fill-secondary: #0a0a0a !important;
+    --block-border-width: 1px !important;
 }
 .gradio-container {
+    background-color: #0a0a0a !important;
     color: #FFFFFF !important;
     font-family: 'Inter', system-ui, sans-serif !important;
 }
+h1 {
+    text-align: center !important;
+    font-weight: 700 !important;
+    margin: 2rem 0 !important;
+    font-size: 2.5rem !important;
+}
+.block {
+    background-color: #0a0a0a !important;
+    border: 1px solid #1a1a1a !important;
+}
+CCCCCC !important;
+    transform: translateY(-2px);
+}
+button:not(.primary) {
+    background-color: #1a1a1a !important;
+    color: #FFFFFF !important;
+    border: 1px solid #333333 !important;
+}
+button:not(.primary):hover {
+    background-color: #2a2a2a !importantput {
+    background-color: #141414 !important;
+    border: 1px solid #1a1a1a !important;
+    color: #FFFFFF !important;
+}
 button.primary {
     background-color: #FFFFFF !important;
     color: #000000 !important;
     font-weight: 600 !important;
     text-transform: uppercase;
     letter-spacing: 1px;
+    transition: all 0.3s ease;
 }
 button.primary:hover {
     background-color: #B2B2B2 !important;
+    transform: translateY(-2px);
+}1a1a1a;
+    border-radius: 4px;
+    background-color: #0a0a0a !important;
 }
+#model-container:fullscreen {
+    background-color: #0a0a0a;
+    width: 100vw;
+    height: 100vh;
+}
+.gallery {
+    background-color: #0a0a0a !important;
+}
+.gallery-item {
+    background-color: #141414 !important;
+    border: 1px solid #1a1a1a !important
+#model-container {
+    border: 1px solid #333333;
+    border-radius: 4px;
 }
 #model-container:fullscreen {
+    background-color: #000000;
     width: 100vw;
     height: 100vh;
 }
 .generating::after {
+    content: "PROCESSING...";
     color: #808080;
     font-size: 10px;
     letter-spacing: 2px;
 @keyframes blink {
     50% { opacity: 0; }
 }
+@keyframes blink {
+    50% { opacity: 0; }
+}

config.py ADDED Viewed

	@@ -0,0 +1,21 @@

+import os
+ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
+DUST3R_DIR = os.path.join(ROOT_DIR, "dust3r")
+ASSETS_DIR = os.path.join(ROOT_DIR, "assets")
+OUTPUT_DIR = os.path.join(ROOT_DIR, "results")
+WEIGHTS_PATH = os.path.join(DUST3R_DIR, "checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth")
+BATCH_SIZE = 1
+IMAGE_SIZE = 512
+DEFAULT_ITERATIONS = 300
+DEFAULT_CONF_THRESHOLD = 0.001
+CONTRAST_ENHANCEMENT = 1.2
+SHARPNESS_ENHANCEMENT = 1.5
+SERVER_NAME = "0.0.0.0"
+SERVER_PORT = 7860
+SHARE = True
+SHOW_ERROR = True

core/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+from .inference import run_inference
+from .preprocessing import preprocess_images
+from .postprocessing import filter_background_points, create_3d_output
+from .visualization import generate_artifacts
+__all__ = [
+    'run_inference',
+    'preprocess_images',
+    'filter_background_points',
+    'create_3d_output',
+    'generate_artifacts'
+]

core/inference.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import torch
+from typing import List, Dict, Any
+from dust3r.utils.device import to_cpu, collate_with_cat as collate
+from config import BATCH_SIZE
+def run_inference(pairs: List, model: torch.nn.Module, device: str, batch_size: int = BATCH_SIZE) -> Dict[str, Any]:
+    result = []
+    for i in range(0, len(pairs), batch_size):
+        batch = collate(pairs[i:i+batch_size])
+        for view in batch:
+            for k in ["img", "pts3d", "valid_mask", "camera_pose", "camera_intrinsics"]:
+                if k in view:
+                    view[k] = view[k].to(device)
+        v1, v2 = batch
+        with torch.cuda.amp.autocast():
+            p1, p2 = model(v1, v2)
+        result.append(to_cpu(dict(view1=v1, view2=v2, pred1=p1, pred2=p2)))
+    return collate(result, lists=True)

core/postprocessing.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import os
+import numpy as np
+import trimesh
+from typing import List, Tuple
+from dust3r.utils.device import to_numpy
+from dust3r.viz import pts3d_to_trimesh, cat_meshes
+from config import DEFAULT_CONF_THRESHOLD
+def filter_background_points(scene, conf_threshold: float = DEFAULT_CONF_THRESHOLD):
+    masks = scene.get_masks()
+    confs = [c for c in scene.im_conf]
+    for i, (mask, conf) in enumerate(zip(masks, confs)):
+        conf_mask = conf > conf_threshold
+        masks[i] = mask & conf_mask
+    return scene
+def create_3d_output(outdir: str, imgs: List, pts3d: List, mask: List,
+                     focals: List, cams2world: List, as_pointcloud: bool = False) -> str:
+    pts3d, imgs, focals, cams2world = map(to_numpy, [pts3d, imgs, focals, cams2world])
+    scene = trimesh.Scene()
+    if as_pointcloud:
+        pts = np.concatenate([p[m] for p, m in zip(pts3d, mask)])
+        col = np.concatenate([imgs[i][mask[i]] for i in range(len(imgs))])
+        geometry = trimesh.PointCloud(pts.reshape(-1, 3), colors=col.reshape(-1, 3))
+    else:
+        meshes = [pts3d_to_trimesh(imgs[i], pts3d[i], mask[i]) for i in range(len(imgs))]
+        geometry = trimesh.Trimesh(**cat_meshes(meshes))
+    centroid = geometry.centroid
+    geometry.apply_translation(-centroid)
+    scene.add_geometry(geometry)
+    flip_correction = np.eye(4)
+    flip_correction[1, 1] = -1
+    flip_correction[2, 2] = -1
+    scene.apply_transform(flip_correction)
+    outfile = os.path.join(outdir, 'object.glb')
+    scene.export(outfile)
+    return outfile

core/preprocessing.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import os
+import tempfile
+from PIL import Image, ImageEnhance
+from typing import List
+from config import CONTRAST_ENHANCEMENT, SHARPNESS_ENHANCEMENT
+def preprocess_images(image_paths: List[str]) -> List[str]:
+    cleaned_paths = []
+    for i, path in enumerate(image_paths):
+        img = Image.open(path).convert("RGB")
+        enhancer = ImageEnhance.Contrast(img)
+        img = enhancer.enhance(CONTRAST_ENHANCEMENT)
+        enhancer = ImageEnhance.Sharpness(img)
+        img = enhancer.enhance(SHARPNESS_ENHANCEMENT)
+        save_path = os.path.join(tempfile.gettempdir(), f"input_refined_{i}.png")
+        img.save(save_path)
+        cleaned_paths.append(save_path)
+    return cleaned_paths

core/visualization.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import matplotlib.pyplot as plt
+from typing import List, Tuple
+from dust3r.utils.image import rgb
+from dust3r.utils.device import to_numpy
+def generate_artifacts(scene) -> List[Tuple]:
+    artifacts = []
+    cmap = plt.get_cmap('jet')
+    depths = to_numpy(scene.get_depthmaps())
+    confs = to_numpy([c for c in scene.im_conf])
+    for i in range(len(scene.imgs)):
+        artifacts.append((scene.imgs[i], f"View {i+1}: RGB"))
+        d_norm = depths[i] / (depths[i].max() + 1e-8)
+        artifacts.append((rgb(d_norm), f"View {i+1}: Depth Map"))
+        c_norm = cmap(confs[i] / (confs[i].max() + 1e-8))
+        artifacts.append((rgb(c_norm), f"View {i+1}: Confidence Heatmap"))
+    return artifacts

dust3r ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit 78e55fd11ef6d838fc3d8c5c5a52b32eac426e09

gradio_ui.py CHANGED Viewed

@@ -1,57 +1,55 @@
 import gradio as gr
 import functools
-from pathlib import Path
-from pipeline import run_pipeline
-ASSETS = Path(__file__).parent / "assets"
-CSS = (ASSETS / "style.css").read_text()
-FULLSCREEN_JS = (ASSETS / "fullscreen.js").read_text()
-def build_ui(outdir, model, device):
-    pipeline = functools.partial(run_pipeline, outdir, model, device, 512)
-    with gr.Blocks(
-        title="3D Object Reconstruction",
-        css=CSS,
-        theme=gr.themes.Base(),
-        fill_width=True,
-    ) as app:
-        gr.Markdown("# 3D Object Reconstruction")
         with gr.Row():
             with gr.Column(scale=1):
-                files = gr.File(file_count="multiple", label="Images")
-                run_btn = gr.Button("Run Inference", variant="primary")
                 with gr.Accordion("Settings", open=False):
-                    iters = gr.Slider(100, 1000, 300, step=50, label="Alignment Iteration")
-                    as_pc = gr.Checkbox(True, label="Render as Point Cloud")
-                    refine = gr.Checkbox(True, label="Filter Background Points")
-                    clean = gr.Checkbox(True, label="Clean-up depthmaps")
             with gr.Column(scale=2):
-                model3d = gr.Model3D(
-                    label="3D Output",
-                    height=600,
-                    elem_id="model-container",
-                )
-                fs_btn = gr.Button("Toggle Full Screen ⛶", size="sm")
         gr.Markdown("---")
-        gallery = gr.Gallery(columns=3, label="RGB | DEPTH | CONFIDENCE")
-        fs_btn.click(None, None, None, js=FULLSCREEN_JS)
-        state = gr.State()
-        run_btn.click(
-            fn=pipeline,
-            inputs=[files, iters, as_pc, refine, clean],
-            outputs=[state, model3d, gallery],
-            show_progress="minimal",
-        )
     return app

 import gradio as gr
 import functools
+import os
+from pipeline import run_pipeline
+from config import ASSETS_DIR, IMAGE_SIZE
+def load_assets():
+    with open(os.path.join(ASSETS_DIR, 'style.css'), 'r') as f:
+        css = f.read()
+    with open(os.path.join(ASSETS_DIR, 'fullscreen.js'), 'r') as f:
+        fullscreen_js = f.read()
+    return css, fullscreen_js
+def build_ui(out_dir: str, model, device: str) -> gr.Blocks:
+    css, fullscreen_js = load_assets()
+    pipeline = functools.partial(run_pipeline, out_dir, model, device, IMAGE_SIZE)
+    with gr.Blocks(title="Multi-View 3D Reconstruction (MV3DR)", css=css, theme=gr.themes.Base(), fill_width=True) as app:
+        gr.Markdown("# Multi-View 3D Reconstruction (MV3DR)")
         with gr.Row():
             with gr.Column(scale=1):
+                input_files = gr.File(file_count="multiple", label="Images")
+                run_btn = gr.Button("Run Inference")
                 with gr.Accordion("Settings", open=False):
+                    n_iterations = gr.Slider(100, 1000, 300, label="Alignment Iterations")
+                    render_mode = gr.Checkbox(True, label="Render as Point Cloud")
+                    post_proc = gr.Checkbox(True, label="Filter Background Points")
+                    clean_depth = gr.Checkbox(True, label="Clean Point Cloud")
             with gr.Column(scale=2):
+                output_model = gr.Model3D(label="3D Output", height=600, elem_id="model-container")
+                full_screen_btn = gr.Button("Toggle Full Screen ⛶", size="sm")
         gr.Markdown("---")
+        with gr.Row():
+            with gr.Column():
+                gr.Markdown("## RGB | DEPTH | CONFIDENCE")
+                artifact_gallery = gr.Gallery(columns=3, height="auto", label="Logs")
+        full_screen_btn.click(None, None, None, js=fullscreen_js)
+        saved_state = gr.State()
+        run_btn.click(fn=pipeline,
+                      inputs=[input_files, n_iterations, render_mode, post_proc, clean_depth],
+                      outputs=[saved_state, output_model, artifact_gallery],
+                      show_progress="minimal")
     return app

main.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import os
+import sys
+import torch
+sys.path.append(os.path.join(os.path.dirname(__file__), 'dust3r'))
+from model import initialize
+from gradio_ui import build_ui
+from config import WEIGHTS_PATH, OUTPUT_DIR, SERVER_NAME, SERVER_PORT, SHARE, SHOW_ERROR
+def main():
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"Using device: {device}")
+    print("Loading DUSt3R model...")
+    model = initialize(WEIGHTS_PATH, device)
+    print("Starting Multi-View 3D Reconstruction (MV3DR)...")
+    app = build_ui(OUTPUT_DIR, model, device)
+    app.launch(
+        server_name=SERVER_NAME,
+        server_port=SERVER_PORT,
+        share=SHARE,
+        show_error=SHOW_ERROR
+    )
+if __name__ == "__main__":
+    main()

model.py CHANGED Viewed

@@ -1,18 +1,20 @@
 import torch
-def initialize(model_path, device):
-    ckpt = torch.load(model_path, map_location="cpu", weights_only=False)
-    args = ckpt["args"].model.replace(
-        "ManyAR_PatchEmbed", "PatchEmbedDust3R"
-    )
-    if "landscape_only" not in args:
-        args = args[:-1] + ", landscape_only=False)"
     else:
-        args = args.replace(" ", "").replace(
-            "landscape_only=True", "landscape_only=False"
-        )
     net = eval(args)
-    net.load_state_dict(ckpt["model"], strict=False)
     return net.to(device)

 import torch
+from dust3r.model import AsymmetricCroCo3DStereo, inf
+def initialize(model_path: str, device: str) -> torch.nn.Module:
+    print(f"Loading model from: {model_path}")
+    ckpt = torch.load(model_path, map_location='cpu', weights_only=False)
+    args = ckpt['args'].model.replace("ManyAR_PatchEmbed", "PatchEmbedDust3R")
+    if 'landscape_only' not in args:
+        args = args[:-1] + ', landscape_only=False)'
     else:
+        args = args.replace(" ", "").replace('landscape_only=True', 'landscape_only=False')
     net = eval(args)
+    net.load_state_dict(ckpt['model'], strict=False)
+    print(f"Model loaded successfully on {device}")
     return net.to(device)

pipeline.py CHANGED Viewed

@@ -1,49 +1,55 @@
-import torch, copy, numpy as np, trimesh, matplotlib.pyplot as plt
 from dust3r.image_pairs import make_pairs
-from dust3r.utils.image import load_images, rgb
-from dust3r.utils.device import to_numpy, to_cpu, collate_with_cat as collate
-from dust3r.viz import pts3d_to_trimesh, cat_meshes
 from dust3r.cloud_opt import global_aligner, GlobalAlignerMode
-from utils import preprocess
-BATCH_SIZE = 1
-def interleave(img1, img2):
-    out = {}
-    for k, v1 in img1.items():
-        v2 = img2[k]
-        if isinstance(v1, torch.Tensor):
-            out[k] = torch.stack((v1, v2), dim=1).flatten(0, 1)
-        else:
-            out[k] = [x for p in zip(v1, v2) for x in p]
-    return out
-def inference(pairs, model, device):
-    results = []
-    for i in range(0, len(pairs), BATCH_SIZE):
-        batch = collate(pairs[i:i+BATCH_SIZE])
-        for view in batch:
-            for k in ["img", "pts3d", "valid_mask", "camera_pose", "camera_intrinsics"]:
-                if k in view:
-                    view[k] = view[k].to(device)
-        v1, v2 = batch
-        v1, v2 = interleave(v1, v2), interleave(v2, v1)
-        with torch.cuda.amp.autocast():
-            p1, p2 = model(v1, v2)
-        results.append(to_cpu(dict(view1=v1, view2=v2, pred1=p1, pred2=p2)))
-    return collate(results, lists=True)
-def create_glb(outdir, scene):
-    meshes = [
-        pts3d_to_trimesh(scene.imgs[i], scene.get_pts3d()[i], scene.get_masks()[i])
-        for i in range(len(scene.imgs))
-    ]
-    mesh = trimesh.Trimesh(**cat_meshes(meshes))
-    mesh.apply_translation(-mesh.centroid)
-    scene_out = trimesh.Scene(mesh)
-    out = f"{outdir}/object.glb"
-    scene_out.export(out)
-    return out

+import torch
+import copy
+from typing import List, Tuple, Optional
 from dust3r.image_pairs import make_pairs
+from dust3r.utils.image import load_images
+from dust3r.utils.device import to_numpy
 from dust3r.cloud_opt import global_aligner, GlobalAlignerMode
+from core.preprocessing import preprocess_images
+from core.inference import run_inference
+from core.postprocessing import filter_background_points, create_3d_output
+from core.visualization import generate_artifacts
+from config import IMAGE_SIZE, DEFAULT_ITERATIONS
+def run_pipeline(outdir: str, model: torch.nn.Module, device: str, img_size: int,
+                filelist: List[str], niter: int, as_pc: bool, refinement: bool,
+                clean_depth: bool = True) -> Tuple[Optional[object], Optional[str], List]:
+    if not filelist or len(filelist) == 0:
+        return None, None, []
+    processed_list = preprocess_images(filelist)
+    imgs = load_images(processed_list, size=img_size)
+    if len(imgs) == 1:
+        imgs = [imgs[0], copy.deepcopy(imgs[0])]
+        imgs[1]['idx'] = 1
+    pairs = make_pairs(imgs, scene_graph="complete", prefilter=None, symmetrize=True)
+    output = run_inference(pairs, model, device)
+    mode = GlobalAlignerMode.PointCloudOptimizer if len(imgs) > 2 else GlobalAlignerMode.PairViewer
+    scene = global_aligner(output, device=device, mode=mode)
+    if mode == GlobalAlignerMode.PointCloudOptimizer:
+        scene.compute_global_alignment(init='mst', niter=niter, schedule='linear', lr=0.01)
+    if clean_depth:
+        scene = scene.clean_pointcloud()
+    if refinement:
+        scene = filter_background_points(scene)
+    glb_path = create_3d_output(
+        outdir, scene.imgs, scene.get_pts3d(),
+        to_numpy(scene.get_masks()), scene.get_focals().cpu(),
+        scene.get_im_poses().cpu(), as_pointcloud=as_pc
+    )
+    artifacts = generate_artifacts(scene)
+    return scene, glb_path, artifacts

requirement.txt → requirements.txt RENAMED Viewed

@@ -9,3 +9,4 @@ matplotlib
 pillow
 tqdm
 numpy

 pillow
 tqdm
 numpy
+opencv-python

setup.sh CHANGED Viewed

@@ -8,9 +8,11 @@ if [ ! -d "dust3r" ]; then
   git clone -b dev --recursive https://github.com/camenduru/dust3r
 fi
-pip install -r requirements.txt
-pip install https://github.com/camenduru/wheels/releases/download/colab/curope-0.0.0-cp310-cp310-linux_x86_64.whl
 mkdir -p dust3r/checkpoints

   git clone -b dev --recursive https://github.com/camenduru/dust3r
 fi
+python -m pip install -r requirements.txt
+if [[ "$(uname)" == "Linux" ]]; then
+  python -m pip install https://github.com/camenduru/wheels/releases/download/colab/curope-0.0.0-cp310-cp310-linux_x86_64.whl
+fi
 mkdir -p dust3r/checkpoints