Upload README.md with huggingface_hub

5935e6a verified 2 days ago

3.96 kB

	---
	license: apache-2.0
	library_name: coremltools
	pipeline_tag: image-to-3d
	tags:
	- coreml
	- 3d-gaussian-splatting
	- avatar
	- face-reconstruction
	- ios
	- apple-neural-engine
	base_model: 3DAIGC/LAM-20K
	---

	# LAM-20K CoreML (INT8 Quantized)

	CoreML conversion of [LAM (Large Avatar Model)](https://github.com/aigc3d/LAM) for on-device 3D avatar reconstruction on iOS/macOS.

	Single photo in, animatable 3D Gaussian head avatar out.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Source \| [3DAIGC/LAM-20K](https://huggingface.co/3DAIGC/LAM-20K) (SIGGRAPH 2025) \|
	\| Parameters \| 557.6M \|
	\| Input \| 518x518 RGB image (DINOv2 ViT-L/14 patch-aligned) \|
	\| Output \| 20,018 Gaussians x 14 channels \|
	\| Precision \| INT8 (linear symmetric quantization) \|
	\| Model size \| 609 MB \|
	\| Format \| CoreML .mlpackage (iOS 17+) \|
	\| Minimum deployment \| iOS 17.0 / macOS 14.0 \|

	### Output Channels (14 per Gaussian)

	\| Channels \| Meaning \|
	\|----------\|---------\|
	\| 0-2 \| Position offsets (xyz) \|
	\| 3-5 \| Colors (RGB, sigmoid-activated) \|
	\| 6 \| Opacity (sigmoid-activated) \|
	\| 7-9 \| Scales (3, exp-activated) \|
	\| 10-13 \| Rotations (unit quaternion) \|

	### Architecture

	```
	Input Image (518x518)
	\|
	DINOv2 ViT-L/14 Encoder --> multi-scale image features
	\|
	10-layer SD3-style Transformer Decoder (FLAME canonical queries)
	\|
	GSLayer MLP Heads --> 20,018 Gaussians x 14 channels
	```

	The 20,018 Gaussians correspond to the FLAME parametric face mesh (5,023 vertices) with one level of subdivision.

	## Usage

	### Swift (iOS/macOS)

	```swift
	import CoreML

	let config = MLModelConfiguration()
	config.computeUnits = .all

	let model = try MLModel(contentsOf: compiledModelURL, configuration: config)

	// Input: 518x518 RGB image as CVPixelBuffer
	let input = try MLDictionaryFeatureProvider(dictionary: [
	"input_image": MLFeatureValue(pixelBuffer: pixelBuffer)
	])

	let output = try model.prediction(from: input)
	let attrs = output.featureValue(for: "gaussian_attributes")!.multiArrayValue!
	// Shape: (1, 20018, 14)
	```

	### Python (verification)

	```python
	import coremltools as ct

	model = ct.models.MLModel("LAMReconstruct_int8.mlpackage")
	prediction = model.predict({"input_image": pil_image})
	attrs = prediction["gaussian_attributes"] # (1, 20018, 14)
	```

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `LAMReconstruct_int8.mlpackage/` \| 609 MB \| CoreML model (INT8 quantized) \|
	\| `LAMReconstruct_int8.mlpackage.zip` \| 525 MB \| Zipped version for direct download \|

	## Conversion

	Converted from the original PyTorch checkpoint using `coremltools 9.0` with extensive patching for macOS compatibility (CUDA stubs, in-place op replacement, torch.compile removal). See [conversion script](https://github.com/spizzerp/DigiFrensiOS/tree/feature/realistic-face-engine/scripts/convert_to_coreml).

	Key conversion steps:
	1. Stub CUDA-only modules (diff_gaussian_rasterization, simple_knn)
	2. Stub chumpy for FLAME model deserialization
	3. Patch GSLayer in-place ops for CoreML tracing
	4. Replace custom trunc_exp autograd.Function with torch.exp
	5. Trace in float16 on CPU (~13.6GB peak memory)
	6. Convert to CoreML with iOS 17 target
	7. INT8 linear symmetric quantization

	## Animation

	The output Gaussians are positioned on the FLAME parametric face mesh. To animate:

	1. Load the FLAME-to-ARKit blendshape mapping (52 ARKit shapes mapped to FLAME expression parameters)
	2. For each ARKit blendshape, apply FLAME Linear Blend Skinning to compute per-Gaussian position deltas
	3. At runtime: `deformed[i] = neutral[i] + sum(weight_j * delta_j[i])`

	Compatible with ARKit face tracking (52 blendshapes) and any system that outputs ARKit-style blend weights.

	## Citation

	```bibtex
	@article{lam2025,
	title={LAM: Large Avatar Model for One-Shot Animatable Gaussian Head Avatar},
	author={Alibaba 3DAIGC Team},
	journal={SIGGRAPH 2025},
	year={2025}
	}
	```

	## License

	Apache-2.0 (same as the original LAM model).