Spaces:
Paused
Paused
Full-res AnyCalib GPU demo: ZeroGPU, full FP32, no resolution limits
Browse files- README.md +44 -7
- __pycache__/app.cpython-312.pyc +0 -0
- app.py +329 -0
- requirements.txt +7 -0
README.md
CHANGED
|
@@ -1,12 +1,49 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
-
pinned:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: AnyCalib GPU
|
| 3 |
+
emoji: "\U0001F4F7"
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.12.0
|
| 8 |
app_file: app.py
|
| 9 |
+
pinned: true
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
tags:
|
| 12 |
+
- camera-calibration
|
| 13 |
+
- anycalib
|
| 14 |
+
- computer-vision
|
| 15 |
+
- lens-correction
|
| 16 |
+
- dinov2
|
| 17 |
+
- gpu
|
| 18 |
+
- zerogpu
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# AnyCalib β Full-Resolution GPU Camera Calibration
|
| 22 |
+
|
| 23 |
+
Single-image camera calibration and lens distortion correction running on **ZeroGPU**.
|
| 24 |
+
|
| 25 |
+
No quantization, no resolution limits β full FP32 inference with the complete AnyCalib pipeline.
|
| 26 |
+
|
| 27 |
+
## What it does
|
| 28 |
+
|
| 29 |
+
1. Upload any image (phone photo, action cam, drone, dashcam, etc.)
|
| 30 |
+
2. DINOv2 ViT-L/14 backbone predicts per-pixel ray directions
|
| 31 |
+
3. RANSAC + Gauss-Newton calibrator fits camera intrinsics
|
| 32 |
+
4. Image is undistorted at **original resolution** using the fitted parameters
|
| 33 |
+
|
| 34 |
+
## Output
|
| 35 |
+
|
| 36 |
+
- **Corrected image** at full input resolution
|
| 37 |
+
- **Camera intrinsics**: focal length, principal point, distortion k1
|
| 38 |
+
- **FOV** (horizontal and vertical)
|
| 39 |
+
- **Distortion type** (barrel, pincushion, or negligible)
|
| 40 |
+
- **Raw JSON** with all parameters, timing, and metadata
|
| 41 |
+
|
| 42 |
+
## Model
|
| 43 |
+
|
| 44 |
+
- **Architecture**: DINOv2 ViT-L/14 (304M) + LightDPT (15.2M) + ConvexTangentDecoder (0.6M)
|
| 45 |
+
- **Total**: ~320M parameters, full FP32
|
| 46 |
+
- **Weights**: [SebRincon/anycalib](https://huggingface.co/SebRincon/anycalib)
|
| 47 |
+
- **ONNX**: [SebRincon/anycalib-onnx](https://huggingface.co/SebRincon/anycalib-onnx)
|
| 48 |
+
- **WASM demo**: [SebRincon/anycalib-wasm](https://huggingface.co/spaces/SebRincon/anycalib-wasm)
|
| 49 |
+
- **Source**: [github.com/javrtg/AnyCalib](https://github.com/javrtg/AnyCalib)
|
__pycache__/app.cpython-312.pyc
ADDED
|
Binary file (15.5 kB). View file
|
|
|
app.py
ADDED
|
@@ -0,0 +1,329 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
AnyCalib β Full-Resolution GPU Camera Calibration & Lens Correction
|
| 3 |
+
|
| 4 |
+
Gradio Space running the full AnyCalib pipeline on ZeroGPU:
|
| 5 |
+
1. DINOv2 ViT-L/14 backbone β LightDPT decoder β ConvexTangentDecoder head
|
| 6 |
+
2. RANSAC + Gauss-Newton calibrator β camera intrinsics [f, cx, cy, k1, ...]
|
| 7 |
+
3. Full-resolution undistortion via grid_sample
|
| 8 |
+
|
| 9 |
+
No resolution limits. No quantization. Full FP32 on a real GPU.
|
| 10 |
+
"""
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
import json
|
| 14 |
+
import time
|
| 15 |
+
|
| 16 |
+
import gradio as gr
|
| 17 |
+
import numpy as np
|
| 18 |
+
import spaces
|
| 19 |
+
import torch
|
| 20 |
+
|
| 21 |
+
# ββ Load model at startup (on CPU β ZeroGPU moves it to GPU per-call) ββ
|
| 22 |
+
|
| 23 |
+
from anycalib.model.anycalib_pretrained import AnyCalib
|
| 24 |
+
from anycalib.cameras.factory import CameraFactory
|
| 25 |
+
|
| 26 |
+
print("[anycalib] Loading model...")
|
| 27 |
+
t0 = time.time()
|
| 28 |
+
MODEL = AnyCalib(model_id="anycalib_gen")
|
| 29 |
+
MODEL.eval()
|
| 30 |
+
print(f"[anycalib] Model loaded in {time.time() - t0:.1f}s "
|
| 31 |
+
f"({sum(p.numel() for p in MODEL.parameters()):,} params)")
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
# ββ Undistortion grid builder ββ
|
| 35 |
+
|
| 36 |
+
def _build_undistort_grid(camera, params, h, w, scale=1.0, target_proj="perspective"):
|
| 37 |
+
"""Build undistortion sampling grid (mirrors AnyCalibRunner._undistort_grid)."""
|
| 38 |
+
params_b = params[None, ...] if params.ndim == 1 else params
|
| 39 |
+
num_f = int(camera.NUM_F)
|
| 40 |
+
f = params_b[..., None, :num_f]
|
| 41 |
+
c = params_b[..., None, num_f:num_f + 2]
|
| 42 |
+
|
| 43 |
+
im_coords = camera.pixel_grid_coords(h, w, params_b, 0.0).reshape(-1, 2)
|
| 44 |
+
im_n = (im_coords - c) / f
|
| 45 |
+
r = torch.linalg.norm(im_n, dim=-1) / scale
|
| 46 |
+
theta = camera.ideal_unprojection(r, target_proj)
|
| 47 |
+
phi = torch.atan2(im_n[..., 1], im_n[..., 0])
|
| 48 |
+
R = torch.sin(theta)
|
| 49 |
+
rays = torch.stack((R * torch.cos(phi), R * torch.sin(phi), torch.cos(theta)), dim=-1)
|
| 50 |
+
|
| 51 |
+
params_proj = params_b
|
| 52 |
+
if num_f == 2:
|
| 53 |
+
params_proj = params_b.clone()
|
| 54 |
+
params_proj[..., :2] = f.amax(dim=-1, keepdim=True)
|
| 55 |
+
|
| 56 |
+
map_xy, valid = camera.project(params_proj, rays)
|
| 57 |
+
if valid is not None:
|
| 58 |
+
valid = valid.reshape(1, h, w)[0]
|
| 59 |
+
|
| 60 |
+
grid = 2.0 * map_xy.reshape(1, h, w, 2) / map_xy.new_tensor((w, h)) - 1.0
|
| 61 |
+
return grid, valid
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
# ββ Main inference function (runs on GPU via ZeroGPU) ββ
|
| 65 |
+
|
| 66 |
+
@spaces.GPU(duration=60)
|
| 67 |
+
@torch.no_grad()
|
| 68 |
+
def run_calibration(
|
| 69 |
+
input_image: np.ndarray,
|
| 70 |
+
cam_id: str,
|
| 71 |
+
scale: float,
|
| 72 |
+
target_proj: str,
|
| 73 |
+
padding_mode: str,
|
| 74 |
+
interp_mode: str,
|
| 75 |
+
k1_threshold: float,
|
| 76 |
+
):
|
| 77 |
+
"""Full pipeline: predict β fit β undistort at original resolution."""
|
| 78 |
+
|
| 79 |
+
if input_image is None:
|
| 80 |
+
raise gr.Error("Please upload an image.")
|
| 81 |
+
|
| 82 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 83 |
+
MODEL.to(device)
|
| 84 |
+
|
| 85 |
+
h, w = input_image.shape[:2]
|
| 86 |
+
t_total = time.time()
|
| 87 |
+
|
| 88 |
+
# ββ Preprocess ββ
|
| 89 |
+
x = input_image.astype("float32") / 255.0
|
| 90 |
+
x = np.transpose(x, (2, 0, 1)) # HWC β CHW
|
| 91 |
+
x_t = torch.from_numpy(x).to(device)
|
| 92 |
+
|
| 93 |
+
# ββ Neural network inference ββ
|
| 94 |
+
t0 = time.time()
|
| 95 |
+
out = MODEL.predict(x_t, cam_id=cam_id)
|
| 96 |
+
intrinsics = out["intrinsics"]
|
| 97 |
+
pred_size = out.get("pred_size")
|
| 98 |
+
t_infer = time.time() - t0
|
| 99 |
+
|
| 100 |
+
# ββ Parse intrinsics ββ
|
| 101 |
+
camera = CameraFactory.create_from_id(cam_id)
|
| 102 |
+
num_f = int(camera.NUM_F)
|
| 103 |
+
intr_list = intrinsics.detach().cpu().numpy().astype(np.float64).tolist()
|
| 104 |
+
|
| 105 |
+
focal = intr_list[:num_f]
|
| 106 |
+
cx_val, cy_val = intr_list[num_f], intr_list[num_f + 1]
|
| 107 |
+
k1_val = intr_list[num_f + 2] if len(intr_list) > num_f + 2 else 0.0
|
| 108 |
+
|
| 109 |
+
# FOV
|
| 110 |
+
f_px = focal[0]
|
| 111 |
+
fov_h = float(2 * np.degrees(np.arctan(w / (2 * f_px)))) if f_px > 0 else 0
|
| 112 |
+
fov_v = float(2 * np.degrees(np.arctan(h / (2 * f_px)))) if f_px > 0 else 0
|
| 113 |
+
|
| 114 |
+
# Distortion type
|
| 115 |
+
if k1_val < -0.001:
|
| 116 |
+
dist_type = "Barrel (k1 < 0)"
|
| 117 |
+
elif k1_val > 0.001:
|
| 118 |
+
dist_type = "Pincushion (k1 > 0)"
|
| 119 |
+
else:
|
| 120 |
+
dist_type = "Negligible"
|
| 121 |
+
|
| 122 |
+
# ββ k1 gating ββ
|
| 123 |
+
skip_undistort = k1_threshold > 0 and abs(k1_val) < k1_threshold
|
| 124 |
+
|
| 125 |
+
if skip_undistort:
|
| 126 |
+
corrected = input_image.copy()
|
| 127 |
+
valid_frac = 1.0
|
| 128 |
+
t_undistort = 0.0
|
| 129 |
+
else:
|
| 130 |
+
# ββ Undistortion at full resolution ββ
|
| 131 |
+
t0 = time.time()
|
| 132 |
+
grid, valid = _build_undistort_grid(
|
| 133 |
+
camera, intrinsics, h, w,
|
| 134 |
+
scale=scale, target_proj=target_proj,
|
| 135 |
+
)
|
| 136 |
+
y_t = torch.nn.functional.grid_sample(
|
| 137 |
+
x_t[None, ...], grid,
|
| 138 |
+
mode=interp_mode,
|
| 139 |
+
padding_mode=padding_mode,
|
| 140 |
+
align_corners=False,
|
| 141 |
+
)
|
| 142 |
+
t_undistort = time.time() - t0
|
| 143 |
+
|
| 144 |
+
valid_frac = float(valid.float().mean().item()) if valid is not None else 1.0
|
| 145 |
+
|
| 146 |
+
y = y_t[0].clamp(0, 1).detach().cpu().numpy()
|
| 147 |
+
y = np.transpose(y, (1, 2, 0))
|
| 148 |
+
corrected = (y * 255.0 + 0.5).astype("uint8")
|
| 149 |
+
|
| 150 |
+
t_total_elapsed = time.time() - t_total
|
| 151 |
+
|
| 152 |
+
# ββ Build params table ββ
|
| 153 |
+
params_md = f"""
|
| 154 |
+
### Camera Intrinsics
|
| 155 |
+
|
| 156 |
+
| Parameter | Value |
|
| 157 |
+
|-----------|-------|
|
| 158 |
+
| **Focal length** | `{f_px:.2f}` px |
|
| 159 |
+
| **Principal point** | `({cx_val:.2f}, {cy_val:.2f})` px |
|
| 160 |
+
| **Distortion k1** | `{k1_val:.6f}` |
|
| 161 |
+
| **Distortion type** | {dist_type} |
|
| 162 |
+
| **FOV (horizontal)** | `{fov_h:.1f}` deg |
|
| 163 |
+
| **FOV (vertical)** | `{fov_v:.1f}` deg |
|
| 164 |
+
| **Valid pixel fraction** | `{valid_frac:.3f}` |
|
| 165 |
+
| **k1 gated (skipped)** | `{skip_undistort}` |
|
| 166 |
+
|
| 167 |
+
### Image Info
|
| 168 |
+
|
| 169 |
+
| Property | Value |
|
| 170 |
+
|----------|-------|
|
| 171 |
+
| **Input resolution** | `{w} x {h}` ({w*h:,} px) |
|
| 172 |
+
| **Model working size** | `{pred_size}` |
|
| 173 |
+
| **Camera model** | `{cam_id}` |
|
| 174 |
+
| **Scale** | `{scale}` |
|
| 175 |
+
| **Target projection** | `{target_proj}` |
|
| 176 |
+
|
| 177 |
+
### Timing
|
| 178 |
+
|
| 179 |
+
| Stage | Time |
|
| 180 |
+
|-------|------|
|
| 181 |
+
| Neural net inference | `{t_infer*1000:.0f}` ms |
|
| 182 |
+
| Undistortion (grid_sample) | `{t_undistort*1000:.0f}` ms |
|
| 183 |
+
| **Total** | **`{t_total_elapsed*1000:.0f}` ms** |
|
| 184 |
+
| Device | `{device}` |
|
| 185 |
+
"""
|
| 186 |
+
|
| 187 |
+
# ββ Raw JSON ββ
|
| 188 |
+
raw_json = json.dumps({
|
| 189 |
+
"intrinsics": {
|
| 190 |
+
"focal_length_px": focal,
|
| 191 |
+
"principal_point": [cx_val, cy_val],
|
| 192 |
+
"k1": k1_val,
|
| 193 |
+
},
|
| 194 |
+
"fov": {"horizontal_deg": fov_h, "vertical_deg": fov_v},
|
| 195 |
+
"distortion": {"type": dist_type, "k1_gated": skip_undistort},
|
| 196 |
+
"image": {
|
| 197 |
+
"input_resolution": [w, h],
|
| 198 |
+
"total_pixels": w * h,
|
| 199 |
+
"model_working_size": pred_size,
|
| 200 |
+
},
|
| 201 |
+
"camera": {
|
| 202 |
+
"model": cam_id,
|
| 203 |
+
"scale": scale,
|
| 204 |
+
"target_projection": target_proj,
|
| 205 |
+
"padding_mode": padding_mode,
|
| 206 |
+
"interpolation": interp_mode,
|
| 207 |
+
},
|
| 208 |
+
"quality": {
|
| 209 |
+
"valid_pixel_fraction": valid_frac,
|
| 210 |
+
},
|
| 211 |
+
"timing_ms": {
|
| 212 |
+
"neural_net": round(t_infer * 1000, 1),
|
| 213 |
+
"undistortion": round(t_undistort * 1000, 1),
|
| 214 |
+
"total": round(t_total_elapsed * 1000, 1),
|
| 215 |
+
},
|
| 216 |
+
"device": str(device),
|
| 217 |
+
"all_intrinsics_raw": intr_list,
|
| 218 |
+
}, indent=2)
|
| 219 |
+
|
| 220 |
+
return corrected, params_md, raw_json
|
| 221 |
+
|
| 222 |
+
|
| 223 |
+
# ββ Gradio UI ββ
|
| 224 |
+
|
| 225 |
+
with gr.Blocks() as demo:
|
| 226 |
+
|
| 227 |
+
gr.Markdown("""
|
| 228 |
+
# AnyCalib β Full-Resolution GPU Camera Calibration
|
| 229 |
+
|
| 230 |
+
Single-image lens calibration & distortion correction powered by
|
| 231 |
+
[AnyCalib](https://github.com/javrtg/AnyCalib) (DINOv2 ViT-L/14 + LightDPT + ConvexTangentDecoder, ~320M params).
|
| 232 |
+
|
| 233 |
+
Running on **GPU via ZeroGPU** β no quantization, no resolution limits, full FP32 inference.
|
| 234 |
+
|
| 235 |
+
Upload any image and get the **corrected (undistorted) image** at original resolution,
|
| 236 |
+
plus camera intrinsics, FOV, distortion parameters, and timing.
|
| 237 |
+
""")
|
| 238 |
+
|
| 239 |
+
with gr.Row():
|
| 240 |
+
with gr.Column(scale=1):
|
| 241 |
+
input_image = gr.Image(
|
| 242 |
+
label="Input Image",
|
| 243 |
+
type="numpy",
|
| 244 |
+
sources=["upload", "clipboard"],
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
with gr.Accordion("Advanced Settings", open=False):
|
| 248 |
+
cam_id = gr.Dropdown(
|
| 249 |
+
label="Camera Model",
|
| 250 |
+
choices=[
|
| 251 |
+
"simple_division:1",
|
| 252 |
+
"division:1",
|
| 253 |
+
"simple_radial:1",
|
| 254 |
+
"simple_kb:1",
|
| 255 |
+
"simple_pinhole",
|
| 256 |
+
"pinhole",
|
| 257 |
+
],
|
| 258 |
+
value="simple_division:1",
|
| 259 |
+
)
|
| 260 |
+
scale = gr.Slider(
|
| 261 |
+
label="Focal Length Scale (< 1 = wider FOV, less crop)",
|
| 262 |
+
minimum=0.5, maximum=1.5, step=0.05, value=1.0,
|
| 263 |
+
)
|
| 264 |
+
target_proj = gr.Dropdown(
|
| 265 |
+
label="Target Projection",
|
| 266 |
+
choices=["perspective", "stereographic", "equidistant", "equisolid", "orthographic"],
|
| 267 |
+
value="perspective",
|
| 268 |
+
)
|
| 269 |
+
padding_mode = gr.Dropdown(
|
| 270 |
+
label="Padding Mode",
|
| 271 |
+
choices=["border", "zeros", "reflection"],
|
| 272 |
+
value="border",
|
| 273 |
+
)
|
| 274 |
+
interp_mode = gr.Dropdown(
|
| 275 |
+
label="Interpolation",
|
| 276 |
+
choices=["bilinear", "bicubic", "nearest"],
|
| 277 |
+
value="bilinear",
|
| 278 |
+
)
|
| 279 |
+
k1_threshold = gr.Slider(
|
| 280 |
+
label="k1 Threshold (skip undistortion if |k1| below this)",
|
| 281 |
+
minimum=0.0, maximum=0.1, step=0.005, value=0.0,
|
| 282 |
+
)
|
| 283 |
+
|
| 284 |
+
run_btn = gr.Button("Run Calibration", variant="primary", size="lg")
|
| 285 |
+
|
| 286 |
+
with gr.Column(scale=1):
|
| 287 |
+
output_image = gr.Image(label="Corrected (Undistorted) Image", type="numpy")
|
| 288 |
+
|
| 289 |
+
with gr.Row():
|
| 290 |
+
with gr.Column():
|
| 291 |
+
params_output = gr.Markdown(label="Camera Parameters")
|
| 292 |
+
with gr.Column():
|
| 293 |
+
json_output = gr.Code(label="Raw JSON Output", language="json")
|
| 294 |
+
|
| 295 |
+
gr.Markdown("""
|
| 296 |
+
---
|
| 297 |
+
### How it works
|
| 298 |
+
|
| 299 |
+
1. **Upload** any image (phone photo, action cam, drone, dashcam, etc.)
|
| 300 |
+
2. The model predicts per-pixel **ray directions** using a DINOv2 ViT-L/14 backbone
|
| 301 |
+
3. **RANSAC + Gauss-Newton** calibrator fits camera intrinsics `[f, cx, cy, k1]` from the rays
|
| 302 |
+
4. Image is **undistorted at full resolution** via differentiable grid_sample
|
| 303 |
+
5. All parameters and raw JSON output are displayed
|
| 304 |
+
|
| 305 |
+
Runs in ~100-500ms on GPU depending on image size.
|
| 306 |
+
|
| 307 |
+
### Links
|
| 308 |
+
|
| 309 |
+
- Raw weights: [SebRincon/anycalib](https://huggingface.co/SebRincon/anycalib) (safetensors)
|
| 310 |
+
- ONNX models: [SebRincon/anycalib-onnx](https://huggingface.co/SebRincon/anycalib-onnx) (FP32/FP16/INT8)
|
| 311 |
+
- WASM demo: [SebRincon/anycalib-wasm](https://huggingface.co/spaces/SebRincon/anycalib-wasm) (browser-only)
|
| 312 |
+
- Source: [github.com/javrtg/AnyCalib](https://github.com/javrtg/AnyCalib)
|
| 313 |
+
""")
|
| 314 |
+
|
| 315 |
+
run_btn.click(
|
| 316 |
+
fn=run_calibration,
|
| 317 |
+
inputs=[input_image, cam_id, scale, target_proj, padding_mode, interp_mode, k1_threshold],
|
| 318 |
+
outputs=[output_image, params_output, json_output],
|
| 319 |
+
)
|
| 320 |
+
|
| 321 |
+
input_image.change(
|
| 322 |
+
fn=run_calibration,
|
| 323 |
+
inputs=[input_image, cam_id, scale, target_proj, padding_mode, interp_mode, k1_threshold],
|
| 324 |
+
outputs=[output_image, params_output, json_output],
|
| 325 |
+
)
|
| 326 |
+
|
| 327 |
+
|
| 328 |
+
if __name__ == "__main__":
|
| 329 |
+
demo.launch()
|
requirements.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
torch>=2.3.0
|
| 2 |
+
torchvision>=0.18.0
|
| 3 |
+
numpy>=1.26.0
|
| 4 |
+
opencv-python-headless>=4.9.0
|
| 5 |
+
anycalib @ git+https://github.com/javrtg/AnyCalib.git@3cf2e5dda92faf80f3548adaa0a8515f807848aa
|
| 6 |
+
safetensors>=0.4.0
|
| 7 |
+
gradio>=4.0.0
|