SebRincon commited on
Commit
b599b20
Β·
verified Β·
1 Parent(s): 3f62a95

Full-res AnyCalib GPU demo: ZeroGPU, full FP32, no resolution limits

Browse files
Files changed (4) hide show
  1. README.md +44 -7
  2. __pycache__/app.cpython-312.pyc +0 -0
  3. app.py +329 -0
  4. requirements.txt +7 -0
README.md CHANGED
@@ -1,12 +1,49 @@
1
  ---
2
- title: Anycalib Gpu
3
- emoji: πŸ“š
4
- colorFrom: pink
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 6.7.0
8
  app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AnyCalib GPU
3
+ emoji: "\U0001F4F7"
4
+ colorFrom: indigo
5
+ colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 5.12.0
8
  app_file: app.py
9
+ pinned: true
10
+ license: apache-2.0
11
+ tags:
12
+ - camera-calibration
13
+ - anycalib
14
+ - computer-vision
15
+ - lens-correction
16
+ - dinov2
17
+ - gpu
18
+ - zerogpu
19
  ---
20
 
21
+ # AnyCalib β€” Full-Resolution GPU Camera Calibration
22
+
23
+ Single-image camera calibration and lens distortion correction running on **ZeroGPU**.
24
+
25
+ No quantization, no resolution limits β€” full FP32 inference with the complete AnyCalib pipeline.
26
+
27
+ ## What it does
28
+
29
+ 1. Upload any image (phone photo, action cam, drone, dashcam, etc.)
30
+ 2. DINOv2 ViT-L/14 backbone predicts per-pixel ray directions
31
+ 3. RANSAC + Gauss-Newton calibrator fits camera intrinsics
32
+ 4. Image is undistorted at **original resolution** using the fitted parameters
33
+
34
+ ## Output
35
+
36
+ - **Corrected image** at full input resolution
37
+ - **Camera intrinsics**: focal length, principal point, distortion k1
38
+ - **FOV** (horizontal and vertical)
39
+ - **Distortion type** (barrel, pincushion, or negligible)
40
+ - **Raw JSON** with all parameters, timing, and metadata
41
+
42
+ ## Model
43
+
44
+ - **Architecture**: DINOv2 ViT-L/14 (304M) + LightDPT (15.2M) + ConvexTangentDecoder (0.6M)
45
+ - **Total**: ~320M parameters, full FP32
46
+ - **Weights**: [SebRincon/anycalib](https://huggingface.co/SebRincon/anycalib)
47
+ - **ONNX**: [SebRincon/anycalib-onnx](https://huggingface.co/SebRincon/anycalib-onnx)
48
+ - **WASM demo**: [SebRincon/anycalib-wasm](https://huggingface.co/spaces/SebRincon/anycalib-wasm)
49
+ - **Source**: [github.com/javrtg/AnyCalib](https://github.com/javrtg/AnyCalib)
__pycache__/app.cpython-312.pyc ADDED
Binary file (15.5 kB). View file
 
app.py ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AnyCalib β€” Full-Resolution GPU Camera Calibration & Lens Correction
3
+
4
+ Gradio Space running the full AnyCalib pipeline on ZeroGPU:
5
+ 1. DINOv2 ViT-L/14 backbone β†’ LightDPT decoder β†’ ConvexTangentDecoder head
6
+ 2. RANSAC + Gauss-Newton calibrator β†’ camera intrinsics [f, cx, cy, k1, ...]
7
+ 3. Full-resolution undistortion via grid_sample
8
+
9
+ No resolution limits. No quantization. Full FP32 on a real GPU.
10
+ """
11
+ from __future__ import annotations
12
+
13
+ import json
14
+ import time
15
+
16
+ import gradio as gr
17
+ import numpy as np
18
+ import spaces
19
+ import torch
20
+
21
+ # ── Load model at startup (on CPU β€” ZeroGPU moves it to GPU per-call) ──
22
+
23
+ from anycalib.model.anycalib_pretrained import AnyCalib
24
+ from anycalib.cameras.factory import CameraFactory
25
+
26
+ print("[anycalib] Loading model...")
27
+ t0 = time.time()
28
+ MODEL = AnyCalib(model_id="anycalib_gen")
29
+ MODEL.eval()
30
+ print(f"[anycalib] Model loaded in {time.time() - t0:.1f}s "
31
+ f"({sum(p.numel() for p in MODEL.parameters()):,} params)")
32
+
33
+
34
+ # ── Undistortion grid builder ──
35
+
36
+ def _build_undistort_grid(camera, params, h, w, scale=1.0, target_proj="perspective"):
37
+ """Build undistortion sampling grid (mirrors AnyCalibRunner._undistort_grid)."""
38
+ params_b = params[None, ...] if params.ndim == 1 else params
39
+ num_f = int(camera.NUM_F)
40
+ f = params_b[..., None, :num_f]
41
+ c = params_b[..., None, num_f:num_f + 2]
42
+
43
+ im_coords = camera.pixel_grid_coords(h, w, params_b, 0.0).reshape(-1, 2)
44
+ im_n = (im_coords - c) / f
45
+ r = torch.linalg.norm(im_n, dim=-1) / scale
46
+ theta = camera.ideal_unprojection(r, target_proj)
47
+ phi = torch.atan2(im_n[..., 1], im_n[..., 0])
48
+ R = torch.sin(theta)
49
+ rays = torch.stack((R * torch.cos(phi), R * torch.sin(phi), torch.cos(theta)), dim=-1)
50
+
51
+ params_proj = params_b
52
+ if num_f == 2:
53
+ params_proj = params_b.clone()
54
+ params_proj[..., :2] = f.amax(dim=-1, keepdim=True)
55
+
56
+ map_xy, valid = camera.project(params_proj, rays)
57
+ if valid is not None:
58
+ valid = valid.reshape(1, h, w)[0]
59
+
60
+ grid = 2.0 * map_xy.reshape(1, h, w, 2) / map_xy.new_tensor((w, h)) - 1.0
61
+ return grid, valid
62
+
63
+
64
+ # ── Main inference function (runs on GPU via ZeroGPU) ──
65
+
66
+ @spaces.GPU(duration=60)
67
+ @torch.no_grad()
68
+ def run_calibration(
69
+ input_image: np.ndarray,
70
+ cam_id: str,
71
+ scale: float,
72
+ target_proj: str,
73
+ padding_mode: str,
74
+ interp_mode: str,
75
+ k1_threshold: float,
76
+ ):
77
+ """Full pipeline: predict β†’ fit β†’ undistort at original resolution."""
78
+
79
+ if input_image is None:
80
+ raise gr.Error("Please upload an image.")
81
+
82
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
83
+ MODEL.to(device)
84
+
85
+ h, w = input_image.shape[:2]
86
+ t_total = time.time()
87
+
88
+ # ── Preprocess ──
89
+ x = input_image.astype("float32") / 255.0
90
+ x = np.transpose(x, (2, 0, 1)) # HWC β†’ CHW
91
+ x_t = torch.from_numpy(x).to(device)
92
+
93
+ # ── Neural network inference ──
94
+ t0 = time.time()
95
+ out = MODEL.predict(x_t, cam_id=cam_id)
96
+ intrinsics = out["intrinsics"]
97
+ pred_size = out.get("pred_size")
98
+ t_infer = time.time() - t0
99
+
100
+ # ── Parse intrinsics ──
101
+ camera = CameraFactory.create_from_id(cam_id)
102
+ num_f = int(camera.NUM_F)
103
+ intr_list = intrinsics.detach().cpu().numpy().astype(np.float64).tolist()
104
+
105
+ focal = intr_list[:num_f]
106
+ cx_val, cy_val = intr_list[num_f], intr_list[num_f + 1]
107
+ k1_val = intr_list[num_f + 2] if len(intr_list) > num_f + 2 else 0.0
108
+
109
+ # FOV
110
+ f_px = focal[0]
111
+ fov_h = float(2 * np.degrees(np.arctan(w / (2 * f_px)))) if f_px > 0 else 0
112
+ fov_v = float(2 * np.degrees(np.arctan(h / (2 * f_px)))) if f_px > 0 else 0
113
+
114
+ # Distortion type
115
+ if k1_val < -0.001:
116
+ dist_type = "Barrel (k1 < 0)"
117
+ elif k1_val > 0.001:
118
+ dist_type = "Pincushion (k1 > 0)"
119
+ else:
120
+ dist_type = "Negligible"
121
+
122
+ # ── k1 gating ──
123
+ skip_undistort = k1_threshold > 0 and abs(k1_val) < k1_threshold
124
+
125
+ if skip_undistort:
126
+ corrected = input_image.copy()
127
+ valid_frac = 1.0
128
+ t_undistort = 0.0
129
+ else:
130
+ # ── Undistortion at full resolution ──
131
+ t0 = time.time()
132
+ grid, valid = _build_undistort_grid(
133
+ camera, intrinsics, h, w,
134
+ scale=scale, target_proj=target_proj,
135
+ )
136
+ y_t = torch.nn.functional.grid_sample(
137
+ x_t[None, ...], grid,
138
+ mode=interp_mode,
139
+ padding_mode=padding_mode,
140
+ align_corners=False,
141
+ )
142
+ t_undistort = time.time() - t0
143
+
144
+ valid_frac = float(valid.float().mean().item()) if valid is not None else 1.0
145
+
146
+ y = y_t[0].clamp(0, 1).detach().cpu().numpy()
147
+ y = np.transpose(y, (1, 2, 0))
148
+ corrected = (y * 255.0 + 0.5).astype("uint8")
149
+
150
+ t_total_elapsed = time.time() - t_total
151
+
152
+ # ── Build params table ──
153
+ params_md = f"""
154
+ ### Camera Intrinsics
155
+
156
+ | Parameter | Value |
157
+ |-----------|-------|
158
+ | **Focal length** | `{f_px:.2f}` px |
159
+ | **Principal point** | `({cx_val:.2f}, {cy_val:.2f})` px |
160
+ | **Distortion k1** | `{k1_val:.6f}` |
161
+ | **Distortion type** | {dist_type} |
162
+ | **FOV (horizontal)** | `{fov_h:.1f}` deg |
163
+ | **FOV (vertical)** | `{fov_v:.1f}` deg |
164
+ | **Valid pixel fraction** | `{valid_frac:.3f}` |
165
+ | **k1 gated (skipped)** | `{skip_undistort}` |
166
+
167
+ ### Image Info
168
+
169
+ | Property | Value |
170
+ |----------|-------|
171
+ | **Input resolution** | `{w} x {h}` ({w*h:,} px) |
172
+ | **Model working size** | `{pred_size}` |
173
+ | **Camera model** | `{cam_id}` |
174
+ | **Scale** | `{scale}` |
175
+ | **Target projection** | `{target_proj}` |
176
+
177
+ ### Timing
178
+
179
+ | Stage | Time |
180
+ |-------|------|
181
+ | Neural net inference | `{t_infer*1000:.0f}` ms |
182
+ | Undistortion (grid_sample) | `{t_undistort*1000:.0f}` ms |
183
+ | **Total** | **`{t_total_elapsed*1000:.0f}` ms** |
184
+ | Device | `{device}` |
185
+ """
186
+
187
+ # ── Raw JSON ──
188
+ raw_json = json.dumps({
189
+ "intrinsics": {
190
+ "focal_length_px": focal,
191
+ "principal_point": [cx_val, cy_val],
192
+ "k1": k1_val,
193
+ },
194
+ "fov": {"horizontal_deg": fov_h, "vertical_deg": fov_v},
195
+ "distortion": {"type": dist_type, "k1_gated": skip_undistort},
196
+ "image": {
197
+ "input_resolution": [w, h],
198
+ "total_pixels": w * h,
199
+ "model_working_size": pred_size,
200
+ },
201
+ "camera": {
202
+ "model": cam_id,
203
+ "scale": scale,
204
+ "target_projection": target_proj,
205
+ "padding_mode": padding_mode,
206
+ "interpolation": interp_mode,
207
+ },
208
+ "quality": {
209
+ "valid_pixel_fraction": valid_frac,
210
+ },
211
+ "timing_ms": {
212
+ "neural_net": round(t_infer * 1000, 1),
213
+ "undistortion": round(t_undistort * 1000, 1),
214
+ "total": round(t_total_elapsed * 1000, 1),
215
+ },
216
+ "device": str(device),
217
+ "all_intrinsics_raw": intr_list,
218
+ }, indent=2)
219
+
220
+ return corrected, params_md, raw_json
221
+
222
+
223
+ # ── Gradio UI ──
224
+
225
+ with gr.Blocks() as demo:
226
+
227
+ gr.Markdown("""
228
+ # AnyCalib β€” Full-Resolution GPU Camera Calibration
229
+
230
+ Single-image lens calibration & distortion correction powered by
231
+ [AnyCalib](https://github.com/javrtg/AnyCalib) (DINOv2 ViT-L/14 + LightDPT + ConvexTangentDecoder, ~320M params).
232
+
233
+ Running on **GPU via ZeroGPU** β€” no quantization, no resolution limits, full FP32 inference.
234
+
235
+ Upload any image and get the **corrected (undistorted) image** at original resolution,
236
+ plus camera intrinsics, FOV, distortion parameters, and timing.
237
+ """)
238
+
239
+ with gr.Row():
240
+ with gr.Column(scale=1):
241
+ input_image = gr.Image(
242
+ label="Input Image",
243
+ type="numpy",
244
+ sources=["upload", "clipboard"],
245
+ )
246
+
247
+ with gr.Accordion("Advanced Settings", open=False):
248
+ cam_id = gr.Dropdown(
249
+ label="Camera Model",
250
+ choices=[
251
+ "simple_division:1",
252
+ "division:1",
253
+ "simple_radial:1",
254
+ "simple_kb:1",
255
+ "simple_pinhole",
256
+ "pinhole",
257
+ ],
258
+ value="simple_division:1",
259
+ )
260
+ scale = gr.Slider(
261
+ label="Focal Length Scale (< 1 = wider FOV, less crop)",
262
+ minimum=0.5, maximum=1.5, step=0.05, value=1.0,
263
+ )
264
+ target_proj = gr.Dropdown(
265
+ label="Target Projection",
266
+ choices=["perspective", "stereographic", "equidistant", "equisolid", "orthographic"],
267
+ value="perspective",
268
+ )
269
+ padding_mode = gr.Dropdown(
270
+ label="Padding Mode",
271
+ choices=["border", "zeros", "reflection"],
272
+ value="border",
273
+ )
274
+ interp_mode = gr.Dropdown(
275
+ label="Interpolation",
276
+ choices=["bilinear", "bicubic", "nearest"],
277
+ value="bilinear",
278
+ )
279
+ k1_threshold = gr.Slider(
280
+ label="k1 Threshold (skip undistortion if |k1| below this)",
281
+ minimum=0.0, maximum=0.1, step=0.005, value=0.0,
282
+ )
283
+
284
+ run_btn = gr.Button("Run Calibration", variant="primary", size="lg")
285
+
286
+ with gr.Column(scale=1):
287
+ output_image = gr.Image(label="Corrected (Undistorted) Image", type="numpy")
288
+
289
+ with gr.Row():
290
+ with gr.Column():
291
+ params_output = gr.Markdown(label="Camera Parameters")
292
+ with gr.Column():
293
+ json_output = gr.Code(label="Raw JSON Output", language="json")
294
+
295
+ gr.Markdown("""
296
+ ---
297
+ ### How it works
298
+
299
+ 1. **Upload** any image (phone photo, action cam, drone, dashcam, etc.)
300
+ 2. The model predicts per-pixel **ray directions** using a DINOv2 ViT-L/14 backbone
301
+ 3. **RANSAC + Gauss-Newton** calibrator fits camera intrinsics `[f, cx, cy, k1]` from the rays
302
+ 4. Image is **undistorted at full resolution** via differentiable grid_sample
303
+ 5. All parameters and raw JSON output are displayed
304
+
305
+ Runs in ~100-500ms on GPU depending on image size.
306
+
307
+ ### Links
308
+
309
+ - Raw weights: [SebRincon/anycalib](https://huggingface.co/SebRincon/anycalib) (safetensors)
310
+ - ONNX models: [SebRincon/anycalib-onnx](https://huggingface.co/SebRincon/anycalib-onnx) (FP32/FP16/INT8)
311
+ - WASM demo: [SebRincon/anycalib-wasm](https://huggingface.co/spaces/SebRincon/anycalib-wasm) (browser-only)
312
+ - Source: [github.com/javrtg/AnyCalib](https://github.com/javrtg/AnyCalib)
313
+ """)
314
+
315
+ run_btn.click(
316
+ fn=run_calibration,
317
+ inputs=[input_image, cam_id, scale, target_proj, padding_mode, interp_mode, k1_threshold],
318
+ outputs=[output_image, params_output, json_output],
319
+ )
320
+
321
+ input_image.change(
322
+ fn=run_calibration,
323
+ inputs=[input_image, cam_id, scale, target_proj, padding_mode, interp_mode, k1_threshold],
324
+ outputs=[output_image, params_output, json_output],
325
+ )
326
+
327
+
328
+ if __name__ == "__main__":
329
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ torch>=2.3.0
2
+ torchvision>=0.18.0
3
+ numpy>=1.26.0
4
+ opencv-python-headless>=4.9.0
5
+ anycalib @ git+https://github.com/javrtg/AnyCalib.git@3cf2e5dda92faf80f3548adaa0a8515f807848aa
6
+ safetensors>=0.4.0
7
+ gradio>=4.0.0