Duplicate from pearsonkyle/Sharp-coreml

71fa15d 3 days ago

7.56 kB

	---
	license: apple-amlr
	library_name: ml-sharp
	pipeline_tag: image-to-3d
	base_model: apple/Sharp
	tags:
	- coreml
	- monocular-view-synthesis
	- gaussian-splatting
	---


	# Sharp Monocular View Synthesis in Less Than a Second (Core ML Edition)

	[![Project Page](https://img.shields.io/badge/Project-Page-green)](https://apple.github.io/ml-sharp/)
	[![arXiv](https://img.shields.io/badge/arXiv-2512.10685-b31b1b.svg)](https://arxiv.org/abs/2512.10685)


	This software project is a communnity contribution and not affiliated with the original the research paper:


	> _Sharp Monocular View Synthesis in Less Than a Second_ by _Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan Richter and Vladlen Koltun_.

	> We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements.

	#### This release includes a fully validated Core ML (.mlpackage) version of SHARP, optimized for CPU, GPU, and Neural Engine inference on macOS and iOS.

	![](viewer.gif)

	Rendered using [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer)

	## Getting started

	### 📦 Download the Core ML Model Only

	```bash
	pip install huggingface-hub
	huggingface-cli download --include sharp.mlpackage/ --local-dir . pearsonkyle/Sharp-coreml
	```

	### 🧰 Clone the Full Repository

	This will include the inference and model conversion/validation scripts.

	```bash
	brew install git-xet
	git xet install
	```

	Clone the model repository:

	```bash
	git clone git@hf.co:pearsonkyle/Sharp-coreml
	```


	### 📱 Run Inference on Apple Devices

	Use the provided [sharp.swift](sharp.swift) inference script to load the model and generate 3D Gaussian splats (PLY) from any image:

	```bash
	# Compile the Swift runner (requires Xcode command-line tools)
	swiftc -O -o run_sharp sharp.swift -framework CoreML -framework CoreImage -framework AppKit

	# Run inference on an image and decimate the output by 50%
	./run_sharp sharp.mlpackage test.png test.ply -d 0.5
	```

	> Inference on an Apple M4 Max takes ~1.9 seconds.

	CLI Features:
	- Automatic model compilation and caching
	- Decimation to reduce point cloud size while preserving visual fidelity
	- Input is expected as a standard RGB image; conversion to [0,1] and CHW format happens inside the model
	- PLY output compatible with [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer), [MetalSplatter](https://github.com/scier/MetalSplatter), and [Three.js](https://threejs.org)


	```bash
	Usage: \(execName) [OPTIONS] <model> <input_image> <output.ply>

	SHARP Model Inference - Generate 3D Gaussian Splats from a single image

	Arguments:
	model Path to the SHARP Core ML model (.mlpackage, .mlmodel, or .mlmodelc)
	input_image Path to input image (PNG, JPEG, etc.)
	output.ply Path for output PLY file

	Options:
	-m, --model PATH Path to Core ML model
	-i, --input PATH Path to input image
	-o, --output PATH Path for output PLY file
	-f, --focal-length FLOAT Focal length in pixels (default: 1536)
	-d, --decimation FLOAT Decimation ratio 0.0-1.0 or percentage 1-100 (default: 1.0 = keep all)
	Example: 0.5 or 50 keeps 50% of Gaussians
	-h, --help Show this help message
	```

	## Model Input and Output

	### 📥 Input
	The Core ML model accepts two inputs:

	- `image`: A 3-channel RGB image in `uint8` format with shape `(1, 3, H, W)`.
	- Values are expected in range `[0, 255]` (no manual normalization required).
	- Recommended resolution: `1536×1536` (matches training size).
	- Aspect ratio is preserved; input will be resized internally if needed.

	- `disparity_factor`: A scalar tensor of shape `(1,)` representing the ratio `focal_length / image_width`.
	- Use `1.0` for standard cameras (e.g., typical smartphone or DSLR).
	- Adjust slightly to control depth scale: higher values = closer objects, lower values = farther scenes.
	- If using the `sharp.swift` runner, this input is automatically computed from your image dimensions.

	### 📤 Output
	The model outputs five tensors representing a 3D Gaussian splat representation:

	\| Output \| Shape \| Description \|
	\|--------\|-------\|-------------\|
	\| `mean_vectors_3d_positions` \| `(1, N, 3)` \| 3D positions in Normalized Device Coordinates (NDC) — x, y, z. \|
	\| `singular_values_scales` \| `(1, N, 3)` \| Scale parameters along each principal axis (width, height, depth). \|
	\| `quaternions_rotations` \| `(1, N, 4)` \| Unit quaternions `[w, x, y, z]` encoding orientation of each Gaussian. \|
	\| `colors_rgb_linear` \| `(1, N, 3)` \| Linear RGB color values in range `[0, 1]` (no gamma correction). \|
	\| `opacities_alpha_channel` \| `(1, N)` \| Opacity (alpha) values per Gaussian, in range `[0, 1]`. \|

	The total number of Gaussians `N` is approximately 1,179,648 for the default model.

	> 🌍 These outputs are fully compatible with [Splat Viewer](https://huggingface.co/spaces/pearsonkyle/Gaussian-Splat-Viewer) and [MetalSplatter](https://github.com/scier/MetalSplatter).


	### 🔍 Model Validation Results

	The Core ML model has been rigorously validated against the original PyTorch implementation. Below are the numerical accuracy metrics across all 5 output tensors:

	\| Output \| Max Diff \| Mean Diff \| P99 Diff \| Angular Diff (°) \| Status \|
	\|--------\|----------\|-----------\|----------\|------------------\|--------\|
	\| Mean Vectors (3D Positions) \| 0.000794 \| 0.000049 \| 0.000094 \| - \| ✅ PASS \|
	\| Singular Values (Scales) \| 0.000035 \| 0.000000 \| 0.000002 \| - \| ✅ PASS \|
	\| Quaternions (Rotations) \| 1.425558 \| 0.000024 \| 0.000067 \| 9.2519 / 0.0019 / 0.0396 \| ✅ PASS \|
	\| Colors (RGB Linear) \| 0.001440 \| 0.000005 \| 0.000055 \| - \| ✅ PASS \|
	\| Opacities (Alpha) \| 0.004183 \| 0.000005 \| 0.000114 \| - \| ✅ PASS \|

	> Validation Notes:
	> - All outputs match PyTorch within 0.01% mean error.
	> - Quaternion angular errors are below 1° for 99% of Gaussians.

	## Reproducing the Conversion

	To reproduce the conversion from PyTorch to Core ML, follow these steps:
	```
	git clone https://github.com/apple/ml-sharp.git
	cd ml-sharp
	conda create -n sharp python=3.13
	conda activate sharp
	pip install -r requirements.txt
	pip install coremltools
	cd ../
	python convert.py
	```

	## Citation

	If you find this work useful, please cite the original paper:

	```bibtex
	@inproceedings{Sharp2025:arxiv,
	title = {Sharp Monocular View Synthesis in Less Than a Second},
	author = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoy and Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
	journal = {arXiv preprint arXiv:2512.10685},
	year = {2025},
	url = {https://arxiv.org/abs/2512.10685},
	}
	```