File size: 3,956 Bytes
5935e6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
library_name: coremltools
pipeline_tag: image-to-3d
tags:
  - coreml
  - 3d-gaussian-splatting
  - avatar
  - face-reconstruction
  - ios
  - apple-neural-engine
base_model: 3DAIGC/LAM-20K
---

# LAM-20K CoreML (INT8 Quantized)

CoreML conversion of [LAM (Large Avatar Model)](https://github.com/aigc3d/LAM) for on-device 3D avatar reconstruction on iOS/macOS.

Single photo in, animatable 3D Gaussian head avatar out.

## Model Details

| Property | Value |
|----------|-------|
| Source | [3DAIGC/LAM-20K](https://huggingface.co/3DAIGC/LAM-20K) (SIGGRAPH 2025) |
| Parameters | 557.6M |
| Input | 518x518 RGB image (DINOv2 ViT-L/14 patch-aligned) |
| Output | 20,018 Gaussians x 14 channels |
| Precision | INT8 (linear symmetric quantization) |
| Model size | 609 MB |
| Format | CoreML .mlpackage (iOS 17+) |
| Minimum deployment | iOS 17.0 / macOS 14.0 |

### Output Channels (14 per Gaussian)

| Channels | Meaning |
|----------|---------|
| 0-2 | Position offsets (xyz) |
| 3-5 | Colors (RGB, sigmoid-activated) |
| 6 | Opacity (sigmoid-activated) |
| 7-9 | Scales (3, exp-activated) |
| 10-13 | Rotations (unit quaternion) |

### Architecture

```
Input Image (518x518)
  |
DINOv2 ViT-L/14 Encoder --> multi-scale image features
  |
10-layer SD3-style Transformer Decoder (FLAME canonical queries)
  |
GSLayer MLP Heads --> 20,018 Gaussians x 14 channels
```

The 20,018 Gaussians correspond to the FLAME parametric face mesh (5,023 vertices) with one level of subdivision.

## Usage

### Swift (iOS/macOS)

```swift
import CoreML

let config = MLModelConfiguration()
config.computeUnits = .all

let model = try MLModel(contentsOf: compiledModelURL, configuration: config)

// Input: 518x518 RGB image as CVPixelBuffer
let input = try MLDictionaryFeatureProvider(dictionary: [
    "input_image": MLFeatureValue(pixelBuffer: pixelBuffer)
])

let output = try model.prediction(from: input)
let attrs = output.featureValue(for: "gaussian_attributes")!.multiArrayValue!
// Shape: (1, 20018, 14)
```

### Python (verification)

```python
import coremltools as ct

model = ct.models.MLModel("LAMReconstruct_int8.mlpackage")
prediction = model.predict({"input_image": pil_image})
attrs = prediction["gaussian_attributes"]  # (1, 20018, 14)
```

## Files

| File | Size | Description |
|------|------|-------------|
| `LAMReconstruct_int8.mlpackage/` | 609 MB | CoreML model (INT8 quantized) |
| `LAMReconstruct_int8.mlpackage.zip` | 525 MB | Zipped version for direct download |

## Conversion

Converted from the original PyTorch checkpoint using `coremltools 9.0` with extensive patching for macOS compatibility (CUDA stubs, in-place op replacement, torch.compile removal). See [conversion script](https://github.com/spizzerp/DigiFrensiOS/tree/feature/realistic-face-engine/scripts/convert_to_coreml).

Key conversion steps:
1. Stub CUDA-only modules (diff_gaussian_rasterization, simple_knn)
2. Stub chumpy for FLAME model deserialization
3. Patch GSLayer in-place ops for CoreML tracing
4. Replace custom trunc_exp autograd.Function with torch.exp
5. Trace in float16 on CPU (~13.6GB peak memory)
6. Convert to CoreML with iOS 17 target
7. INT8 linear symmetric quantization

## Animation

The output Gaussians are positioned on the FLAME parametric face mesh. To animate:

1. Load the FLAME-to-ARKit blendshape mapping (52 ARKit shapes mapped to FLAME expression parameters)
2. For each ARKit blendshape, apply FLAME Linear Blend Skinning to compute per-Gaussian position deltas
3. At runtime: `deformed[i] = neutral[i] + sum(weight_j * delta_j[i])`

Compatible with ARKit face tracking (52 blendshapes) and any system that outputs ARKit-style blend weights.

## Citation

```bibtex
@article{lam2025,
    title={LAM: Large Avatar Model for One-Shot Animatable Gaussian Head Avatar},
    author={Alibaba 3DAIGC Team},
    journal={SIGGRAPH 2025},
    year={2025}
}
```

## License

Apache-2.0 (same as the original LAM model).