| | --- |
| | license: apache-2.0 |
| | library_name: coremltools |
| | pipeline_tag: image-to-3d |
| | tags: |
| | - coreml |
| | - 3d-gaussian-splatting |
| | - avatar |
| | - face-reconstruction |
| | - ios |
| | - apple-neural-engine |
| | base_model: 3DAIGC/LAM-20K |
| | --- |
| | |
| | # LAM-20K CoreML (INT8 Quantized) |
| |
|
| | CoreML conversion of [LAM (Large Avatar Model)](https://github.com/aigc3d/LAM) for on-device 3D avatar reconstruction on iOS/macOS. |
| |
|
| | Single photo in, animatable 3D Gaussian head avatar out. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Source | [3DAIGC/LAM-20K](https://huggingface.co/3DAIGC/LAM-20K) (SIGGRAPH 2025) | |
| | | Parameters | 557.6M | |
| | | Input | 518x518 RGB image (DINOv2 ViT-L/14 patch-aligned) | |
| | | Output | 20,018 Gaussians x 14 channels | |
| | | Precision | INT8 (linear symmetric quantization) | |
| | | Model size | 609 MB | |
| | | Format | CoreML .mlpackage (iOS 17+) | |
| | | Minimum deployment | iOS 17.0 / macOS 14.0 | |
| |
|
| | ### Output Channels (14 per Gaussian) |
| |
|
| | | Channels | Meaning | |
| | |----------|---------| |
| | | 0-2 | Position offsets (xyz) | |
| | | 3-5 | Colors (RGB, sigmoid-activated) | |
| | | 6 | Opacity (sigmoid-activated) | |
| | | 7-9 | Scales (3, exp-activated) | |
| | | 10-13 | Rotations (unit quaternion) | |
| |
|
| | ### Architecture |
| |
|
| | ``` |
| | Input Image (518x518) |
| | | |
| | DINOv2 ViT-L/14 Encoder --> multi-scale image features |
| | | |
| | 10-layer SD3-style Transformer Decoder (FLAME canonical queries) |
| | | |
| | GSLayer MLP Heads --> 20,018 Gaussians x 14 channels |
| | ``` |
| |
|
| | The 20,018 Gaussians correspond to the FLAME parametric face mesh (5,023 vertices) with one level of subdivision. |
| |
|
| | ## Usage |
| |
|
| | ### Swift (iOS/macOS) |
| |
|
| | ```swift |
| | import CoreML |
| | |
| | let config = MLModelConfiguration() |
| | config.computeUnits = .all |
| | |
| | let model = try MLModel(contentsOf: compiledModelURL, configuration: config) |
| | |
| | // Input: 518x518 RGB image as CVPixelBuffer |
| | let input = try MLDictionaryFeatureProvider(dictionary: [ |
| | "input_image": MLFeatureValue(pixelBuffer: pixelBuffer) |
| | ]) |
| | |
| | let output = try model.prediction(from: input) |
| | let attrs = output.featureValue(for: "gaussian_attributes")!.multiArrayValue! |
| | // Shape: (1, 20018, 14) |
| | ``` |
| |
|
| | ### Python (verification) |
| |
|
| | ```python |
| | import coremltools as ct |
| | |
| | model = ct.models.MLModel("LAMReconstruct_int8.mlpackage") |
| | prediction = model.predict({"input_image": pil_image}) |
| | attrs = prediction["gaussian_attributes"] # (1, 20018, 14) |
| | ``` |
| |
|
| | ## Files |
| |
|
| | | File | Size | Description | |
| | |------|------|-------------| |
| | | `LAMReconstruct_int8.mlpackage/` | 609 MB | CoreML model (INT8 quantized) | |
| | | `LAMReconstruct_int8.mlpackage.zip` | 525 MB | Zipped version for direct download | |
| |
|
| | ## Conversion |
| |
|
| | Converted from the original PyTorch checkpoint using `coremltools 9.0` with extensive patching for macOS compatibility (CUDA stubs, in-place op replacement, torch.compile removal). See [conversion script](https://github.com/spizzerp/DigiFrensiOS/tree/feature/realistic-face-engine/scripts/convert_to_coreml). |
| |
|
| | Key conversion steps: |
| | 1. Stub CUDA-only modules (diff_gaussian_rasterization, simple_knn) |
| | 2. Stub chumpy for FLAME model deserialization |
| | 3. Patch GSLayer in-place ops for CoreML tracing |
| | 4. Replace custom trunc_exp autograd.Function with torch.exp |
| | 5. Trace in float16 on CPU (~13.6GB peak memory) |
| | 6. Convert to CoreML with iOS 17 target |
| | 7. INT8 linear symmetric quantization |
| |
|
| | ## Animation |
| |
|
| | The output Gaussians are positioned on the FLAME parametric face mesh. To animate: |
| |
|
| | 1. Load the FLAME-to-ARKit blendshape mapping (52 ARKit shapes mapped to FLAME expression parameters) |
| | 2. For each ARKit blendshape, apply FLAME Linear Blend Skinning to compute per-Gaussian position deltas |
| | 3. At runtime: `deformed[i] = neutral[i] + sum(weight_j * delta_j[i])` |
| |
|
| | Compatible with ARKit face tracking (52 blendshapes) and any system that outputs ARKit-style blend weights. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{lam2025, |
| | title={LAM: Large Avatar Model for One-Shot Animatable Gaussian Head Avatar}, |
| | author={Alibaba 3DAIGC Team}, |
| | journal={SIGGRAPH 2025}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache-2.0 (same as the original LAM model). |
| |
|