File size: 7,558 Bytes
bef445e
 
556456f
 
 
3b9ff1b
 
 
 
bef445e
556456f
 
 
 
 
 
 
 
3b9ff1b
556456f
3b9ff1b
 
 
 
 
 
556456f
 
 
 
 
 
 
afec961
556456f
afec961
 
 
 
 
 
 
 
556456f
 
 
 
 
 
 
 
 
374469f
556456f
 
afec961
556456f
 
 
 
 
 
 
 
 
dc95a1d
556456f
 
 
 
afec961
 
 
 
 
 
 
556456f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1aecf6d
 
689a8bc
556456f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b530a5
bf3686e
556456f
 
bf3686e
556456f
bf3686e
556456f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---

license: apple-amlr
library_name: ml-sharp
pipeline_tag: image-to-3d
base_model: apple/Sharp
tags:
  - coreml
  - monocular-view-synthesis
  - gaussian-splatting
---



# Sharp Monocular View Synthesis in Less Than a Second (Core ML Edition)

[![Project Page](https://img.shields.io/badge/Project-Page-green)](https://apple.github.io/ml-sharp/)
[![arXiv](https://img.shields.io/badge/arXiv-2512.10685-b31b1b.svg)](https://arxiv.org/abs/2512.10685)


This software project is a communnity contribution and not affiliated with the original the research paper: 


> _Sharp Monocular View Synthesis in Less Than a Second_ by _Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun_.

> We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.

#### This release includes a fully validated **Core ML (.mlpackage)** version of SHARP, optimized for CPU, GPU, and Neural Engine inference on macOS and iOS.

![](viewer.gif)

Rendered using [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer)

## Getting started

### 📦 Download the Core ML Model Only

```bash

pip install huggingface-hub

huggingface-cli download --include sharp.mlpackage/ --local-dir . pearsonkyle/Sharp-coreml

```

### 🧰 Clone the Full Repository

This will include the inference and model conversion/validation scripts.

```bash

brew install git-xet

git xet install

```

Clone the model repository:

```bash

git clone git@hf.co:pearsonkyle/Sharp-coreml

```


### 📱 Run Inference on Apple Devices

Use the provided [sharp.swift](sharp.swift) inference script to load the model and generate 3D Gaussian splats (PLY) from any image:

```bash

# Compile the Swift runner (requires Xcode command-line tools)

swiftc -O -o run_sharp sharp.swift -framework CoreML -framework CoreImage -framework AppKit



# Run inference on an image and decimate the output by 50%

./run_sharp sharp.mlpackage test.png test.ply -d 0.5

```

> Inference on an Apple M4 Max takes ~1.9 seconds.

**CLI Features:**
- Automatic model compilation and caching  
- Decimation to reduce point cloud size while preserving visual fidelity
- Input is expected as a standard RGB image; conversion to [0,1] and CHW format happens inside the model
- PLY output compatible with [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer), [MetalSplatter](https://github.com/scier/MetalSplatter), and [Three.js](https://threejs.org)


```bash

Usage: \(execName) [OPTIONS] <model> <input_image> <output.ply>



SHARP Model Inference - Generate 3D Gaussian Splats from a single image



Arguments:

    model              Path to the SHARP Core ML model (.mlpackage, .mlmodel, or .mlmodelc)

    input_image        Path to input image (PNG, JPEG, etc.)

    output.ply         Path for output PLY file



Options: 

    -m, --model PATH           Path to Core ML model

    -i, --input PATH           Path to input image

    -o, --output PATH          Path for output PLY file

    -f, --focal-length FLOAT   Focal length in pixels (default: 1536)

    -d, --decimation FLOAT     Decimation ratio 0.0-1.0 or percentage 1-100 (default:  1.0 = keep all)

                                Example: 0.5 or 50 keeps 50% of Gaussians

    -h, --help                 Show this help message

```

## Model Input and Output

### 📥 Input
The Core ML model accepts two inputs:

- **`image`**: A 3-channel RGB image in `uint8` format with shape `(1, 3, H, W)`.  
  - Values are expected in range `[0, 255]` (no manual normalization required).  
  - Recommended resolution: `1536×1536` (matches training size).  
  - Aspect ratio is preserved; input will be resized internally if needed.

- **`disparity_factor`**: A scalar tensor of shape `(1,)` representing the ratio `focal_length / image_width`.  

  - Use `1.0` for standard cameras (e.g., typical smartphone or DSLR).  

  - Adjust slightly to control depth scale: higher values = closer objects, lower values = farther scenes.

  - If using the `sharp.swift` runner, this input is automatically computed from your image dimensions.



### 📤 Output

The model outputs five tensors representing a 3D Gaussian splat representation:



| Output | Shape | Description |

|--------|-------|-------------|

| `mean_vectors_3d_positions` | `(1, N, 3)` | 3D positions in Normalized Device Coordinates (NDC) — x, y, z. |

| `singular_values_scales` | `(1, N, 3)` | Scale parameters along each principal axis (width, height, depth). |

| `quaternions_rotations` | `(1, N, 4)` | Unit quaternions `[w, x, y, z]` encoding orientation of each Gaussian. |

| `colors_rgb_linear` | `(1, N, 3)` | Linear RGB color values in range `[0, 1]` (no gamma correction). |

| `opacities_alpha_channel` | `(1, N)` | Opacity (alpha) values per Gaussian, in range `[0, 1]`. |



The total number of Gaussians `N` is approximately 1,179,648 for the default model.



> 🌍 These outputs are fully compatible with [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer) and [MetalSplatter](https://github.com/scier/MetalSplatter).





### 🔍 Model Validation Results



The Core ML model has been rigorously validated against the original PyTorch implementation. Below are the numerical accuracy metrics across all 5 output tensors:



| Output | Max Diff | Mean Diff | P99 Diff | Angular Diff (°) | Status |

|--------|----------|-----------|----------|------------------|--------|

| Mean Vectors (3D Positions) | 0.000794 | 0.000049 | 0.000094 | - | ✅ PASS |

| Singular Values (Scales) | 0.000035 | 0.000000 | 0.000002 | - | ✅ PASS |

| Quaternions (Rotations) | 1.425558 | 0.000024 | 0.000067 | 9.2519 / 0.0019 / 0.0396 | ✅ PASS |

| Colors (RGB Linear) | 0.001440 | 0.000005 | 0.000055 | - | ✅ PASS |

| Opacities (Alpha) | 0.004183 | 0.000005 | 0.000114 | - | ✅ PASS |



> **Validation Notes:**

> - All outputs match PyTorch within 0.01% mean error.

> - Quaternion angular errors are below 1° for 99% of Gaussians.



## Reproducing the Conversion



To reproduce the conversion from PyTorch to Core ML, follow these steps:

```

git clone https://github.com/apple/ml-sharp.git

cd ml-sharp

conda create -n sharp python=3.13

conda activate sharp

pip install -r requirements.txt

pip install coremltools

cd ../

python convert.py

```



## Citation



If you find this work useful, please cite the original paper:



```bibtex

@inproceedings{Sharp2025:arxiv,

  title      = {Sharp Monocular View Synthesis in Less Than a Second},

  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},

  journal    = {arXiv preprint arXiv:2512.10685},

  year       = {2025},

  url        = {https://arxiv.org/abs/2512.10685},

}

```