File size: 3,956 Bytes
5935e6a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | ---
license: apache-2.0
library_name: coremltools
pipeline_tag: image-to-3d
tags:
- coreml
- 3d-gaussian-splatting
- avatar
- face-reconstruction
- ios
- apple-neural-engine
base_model: 3DAIGC/LAM-20K
---
# LAM-20K CoreML (INT8 Quantized)
CoreML conversion of [LAM (Large Avatar Model)](https://github.com/aigc3d/LAM) for on-device 3D avatar reconstruction on iOS/macOS.
Single photo in, animatable 3D Gaussian head avatar out.
## Model Details
| Property | Value |
|----------|-------|
| Source | [3DAIGC/LAM-20K](https://huggingface.co/3DAIGC/LAM-20K) (SIGGRAPH 2025) |
| Parameters | 557.6M |
| Input | 518x518 RGB image (DINOv2 ViT-L/14 patch-aligned) |
| Output | 20,018 Gaussians x 14 channels |
| Precision | INT8 (linear symmetric quantization) |
| Model size | 609 MB |
| Format | CoreML .mlpackage (iOS 17+) |
| Minimum deployment | iOS 17.0 / macOS 14.0 |
### Output Channels (14 per Gaussian)
| Channels | Meaning |
|----------|---------|
| 0-2 | Position offsets (xyz) |
| 3-5 | Colors (RGB, sigmoid-activated) |
| 6 | Opacity (sigmoid-activated) |
| 7-9 | Scales (3, exp-activated) |
| 10-13 | Rotations (unit quaternion) |
### Architecture
```
Input Image (518x518)
|
DINOv2 ViT-L/14 Encoder --> multi-scale image features
|
10-layer SD3-style Transformer Decoder (FLAME canonical queries)
|
GSLayer MLP Heads --> 20,018 Gaussians x 14 channels
```
The 20,018 Gaussians correspond to the FLAME parametric face mesh (5,023 vertices) with one level of subdivision.
## Usage
### Swift (iOS/macOS)
```swift
import CoreML
let config = MLModelConfiguration()
config.computeUnits = .all
let model = try MLModel(contentsOf: compiledModelURL, configuration: config)
// Input: 518x518 RGB image as CVPixelBuffer
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_image": MLFeatureValue(pixelBuffer: pixelBuffer)
])
let output = try model.prediction(from: input)
let attrs = output.featureValue(for: "gaussian_attributes")!.multiArrayValue!
// Shape: (1, 20018, 14)
```
### Python (verification)
```python
import coremltools as ct
model = ct.models.MLModel("LAMReconstruct_int8.mlpackage")
prediction = model.predict({"input_image": pil_image})
attrs = prediction["gaussian_attributes"] # (1, 20018, 14)
```
## Files
| File | Size | Description |
|------|------|-------------|
| `LAMReconstruct_int8.mlpackage/` | 609 MB | CoreML model (INT8 quantized) |
| `LAMReconstruct_int8.mlpackage.zip` | 525 MB | Zipped version for direct download |
## Conversion
Converted from the original PyTorch checkpoint using `coremltools 9.0` with extensive patching for macOS compatibility (CUDA stubs, in-place op replacement, torch.compile removal). See [conversion script](https://github.com/spizzerp/DigiFrensiOS/tree/feature/realistic-face-engine/scripts/convert_to_coreml).
Key conversion steps:
1. Stub CUDA-only modules (diff_gaussian_rasterization, simple_knn)
2. Stub chumpy for FLAME model deserialization
3. Patch GSLayer in-place ops for CoreML tracing
4. Replace custom trunc_exp autograd.Function with torch.exp
5. Trace in float16 on CPU (~13.6GB peak memory)
6. Convert to CoreML with iOS 17 target
7. INT8 linear symmetric quantization
## Animation
The output Gaussians are positioned on the FLAME parametric face mesh. To animate:
1. Load the FLAME-to-ARKit blendshape mapping (52 ARKit shapes mapped to FLAME expression parameters)
2. For each ARKit blendshape, apply FLAME Linear Blend Skinning to compute per-Gaussian position deltas
3. At runtime: `deformed[i] = neutral[i] + sum(weight_j * delta_j[i])`
Compatible with ARKit face tracking (52 blendshapes) and any system that outputs ARKit-style blend weights.
## Citation
```bibtex
@article{lam2025,
title={LAM: Large Avatar Model for One-Shot Animatable Gaussian Head Avatar},
author={Alibaba 3DAIGC Team},
journal={SIGGRAPH 2025},
year={2025}
}
```
## License
Apache-2.0 (same as the original LAM model).
|