File size: 5,355 Bytes

ab3f782
 
6d257c6
 
 
 
5cd2df6
6d257c6
 
5cd2df6
 
ab3f782
6d257c6
 
5cd2df6
6d257c6
 
 
 
 
 
 
 
 
 
 
 
5cd2df6
6d257c6
 
 
 
 
 
 
5cd2df6
6d257c6
5cd2df6
6d257c6
 
5cd2df6
 
6d257c6
 
5cd2df6
 
 
 
 
 
 
6d257c6
5cd2df6
 
 
 
 
6d257c6
 
 
 
5cd2df6
6d257c6
5cd2df6
 
 
 
6d257c6
5cd2df6
 
 
6d257c6
 
 
 
 
 
 
 
 
 
 
 
 
 
5cd2df6
6d257c6
5cd2df6
6d257c6
5cd2df6
 
 
6d257c6
5cd2df6
 
 
6d257c6
5cd2df6
 
 
 
 
 
6d257c6
5cd2df6
 
 
6d257c6

---

license: apple-amlr
library_name: ml-sharp
pipeline_tag: image-to-3d
base_model: apple/Sharp
tags:
  - onnx
  - monocular-view-synthesis
  - gaussian-splatting
  - quantization
  - fp16
---



# Sharp Monocular View Synthesis in Less Than a Second (ONNX Edition)

[![Project Page](https://img.shields.io/badge/Project-Page-green)](https://apple.github.io/ml-sharp/)
[![arXiv](https://img.shields.io/badge/arXiv-2512.10685-b31b1b.svg)](https://arxiv.org/abs/2512.10685)


This software project is a communnity contribution and not affiliated with the original the research paper: 


> _Sharp Monocular View Synthesis in Less Than a Second_ by _Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun_.

> We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.

#### This release includes fully validated **ONNX** versions of SHARP (FP32 and FP16), optimized for cross-platform inference on Windows, Linux, and macOS.

![](viewer.gif)

Rendered using [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer)

## Getting started

### 🚀 Run Inference

Use the provided [inference_onnx.py](inference_onnx.py) script to run SHARP inference:

```bash

# Run inference with FP16 model (faster, smaller)

python inference_onnx.py -m sharp_fp16.onnx -i test.png -o test.ply -d 0.5

```

**CLI Options:**
- `-m, --model`: Path to ONNX model file
- `-i, --input`: Path to input image (PNG, JPEG, etc.)
- `-o, --output`: Path for output PLY file
- `-d, --decimate`: Decimation ratio 0.0-1.0 (default: 1.0 = keep all)
- `--disparity-factor`: Depth scale factor (default: 1.0)
- `--depth-scale`: Depth exaggeration factor (default: 1.0)

**Features:**
- Cross-platform ONNX Runtime inference (CPU/GPU)
- Automatic image preprocessing and resizing
- Gaussian decimation for reduced file sizes
- PLY output compatible with all major 3D Gaussian viewers

## Model Input and Output

### 📥 Input
The ONNX model accepts two inputs:

- **`image`**: A 3-channel RGB image in `float32` format with shape `(1, 3, H, W)`.
  - Values expected in range `[0, 1]` (normalized RGB).
  - Recommended resolution: `1536×1536` (matches training size).
  - Aspect ratio preserved; input resized internally if needed.

- **`disparity_factor`**: A scalar tensor of shape `(1,)` representing the ratio `focal_length / image_width`.

  - Use `1.0` for standard cameras (e.g., typical smartphone or DSLR).

  - Adjust to control depth scale: higher values = closer objects, lower values = farther scenes.



### 📤 Output

The model outputs five tensors representing a 3D Gaussian splat representation:



| Output | Shape | Description |

|--------|-------|-------------|

| `mean_vectors_3d_positions` | `(1, N, 3)` | 3D positions in Normalized Device Coordinates (NDC) — x, y, z. |

| `singular_values_scales` | `(1, N, 3)` | Scale parameters along each principal axis (width, height, depth). |

| `quaternions_rotations` | `(1, N, 4)` | Unit quaternions `[w, x, y, z]` encoding orientation of each Gaussian. |

| `colors_rgb_linear` | `(1, N, 3)` | Linear RGB color values in range `[0, 1]` (no gamma correction). |

| `opacities_alpha_channel` | `(1, N)` | Opacity (alpha) values per Gaussian, in range `[0, 1]`. |



The total number of Gaussians `N` is approximately 1,179,648 for the default model.



## Model Conversion



To convert SHARP from PyTorch to ONNX, use the provided conversion script:



```bash

# Convert to FP32 ONNX (higher precision)

python convert_onnx.py -o sharp.onnx --validate



# Convert to FP16 ONNX (faster inference, smaller model)

python convert_onnx.py -o sharp_fp16.onnx -q fp16 --validate

```



**Conversion Options:**
- `-c, --checkpoint`: Path to PyTorch checkpoint (downloads from Apple if not provided)
- `-o, --output`: Output ONNX model path
- `-q, --quantize`: Quantization type (`fp16` for half-precision)
- `--validate`: Validate converted model against PyTorch reference
- `--input-image`: Path to test image for validation

**Requirements:**
- PyTorch and ml-sharp source code (automatically downloaded)
- ONNX and ONNX Runtime for validation

## Citation

If you find this work useful, please cite the original paper:

```bibtex

@inproceedings{Sharp2025:arxiv,

  title      = {Sharp Monocular View Synthesis in Less Than a Second},

  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},

  journal    = {arXiv preprint arXiv:2512.10685},

  year       = {2025},

  url        = {https://arxiv.org/abs/2512.10685},

}

```