LAM-20K-CoreML / README.md
spizzerp's picture
Upload README.md with huggingface_hub
5935e6a verified
---
license: apache-2.0
library_name: coremltools
pipeline_tag: image-to-3d
tags:
- coreml
- 3d-gaussian-splatting
- avatar
- face-reconstruction
- ios
- apple-neural-engine
base_model: 3DAIGC/LAM-20K
---
# LAM-20K CoreML (INT8 Quantized)
CoreML conversion of [LAM (Large Avatar Model)](https://github.com/aigc3d/LAM) for on-device 3D avatar reconstruction on iOS/macOS.
Single photo in, animatable 3D Gaussian head avatar out.
## Model Details
| Property | Value |
|----------|-------|
| Source | [3DAIGC/LAM-20K](https://huggingface.co/3DAIGC/LAM-20K) (SIGGRAPH 2025) |
| Parameters | 557.6M |
| Input | 518x518 RGB image (DINOv2 ViT-L/14 patch-aligned) |
| Output | 20,018 Gaussians x 14 channels |
| Precision | INT8 (linear symmetric quantization) |
| Model size | 609 MB |
| Format | CoreML .mlpackage (iOS 17+) |
| Minimum deployment | iOS 17.0 / macOS 14.0 |
### Output Channels (14 per Gaussian)
| Channels | Meaning |
|----------|---------|
| 0-2 | Position offsets (xyz) |
| 3-5 | Colors (RGB, sigmoid-activated) |
| 6 | Opacity (sigmoid-activated) |
| 7-9 | Scales (3, exp-activated) |
| 10-13 | Rotations (unit quaternion) |
### Architecture
```
Input Image (518x518)
|
DINOv2 ViT-L/14 Encoder --> multi-scale image features
|
10-layer SD3-style Transformer Decoder (FLAME canonical queries)
|
GSLayer MLP Heads --> 20,018 Gaussians x 14 channels
```
The 20,018 Gaussians correspond to the FLAME parametric face mesh (5,023 vertices) with one level of subdivision.
## Usage
### Swift (iOS/macOS)
```swift
import CoreML
let config = MLModelConfiguration()
config.computeUnits = .all
let model = try MLModel(contentsOf: compiledModelURL, configuration: config)
// Input: 518x518 RGB image as CVPixelBuffer
let input = try MLDictionaryFeatureProvider(dictionary: [
"input_image": MLFeatureValue(pixelBuffer: pixelBuffer)
])
let output = try model.prediction(from: input)
let attrs = output.featureValue(for: "gaussian_attributes")!.multiArrayValue!
// Shape: (1, 20018, 14)
```
### Python (verification)
```python
import coremltools as ct
model = ct.models.MLModel("LAMReconstruct_int8.mlpackage")
prediction = model.predict({"input_image": pil_image})
attrs = prediction["gaussian_attributes"] # (1, 20018, 14)
```
## Files
| File | Size | Description |
|------|------|-------------|
| `LAMReconstruct_int8.mlpackage/` | 609 MB | CoreML model (INT8 quantized) |
| `LAMReconstruct_int8.mlpackage.zip` | 525 MB | Zipped version for direct download |
## Conversion
Converted from the original PyTorch checkpoint using `coremltools 9.0` with extensive patching for macOS compatibility (CUDA stubs, in-place op replacement, torch.compile removal). See [conversion script](https://github.com/spizzerp/DigiFrensiOS/tree/feature/realistic-face-engine/scripts/convert_to_coreml).
Key conversion steps:
1. Stub CUDA-only modules (diff_gaussian_rasterization, simple_knn)
2. Stub chumpy for FLAME model deserialization
3. Patch GSLayer in-place ops for CoreML tracing
4. Replace custom trunc_exp autograd.Function with torch.exp
5. Trace in float16 on CPU (~13.6GB peak memory)
6. Convert to CoreML with iOS 17 target
7. INT8 linear symmetric quantization
## Animation
The output Gaussians are positioned on the FLAME parametric face mesh. To animate:
1. Load the FLAME-to-ARKit blendshape mapping (52 ARKit shapes mapped to FLAME expression parameters)
2. For each ARKit blendshape, apply FLAME Linear Blend Skinning to compute per-Gaussian position deltas
3. At runtime: `deformed[i] = neutral[i] + sum(weight_j * delta_j[i])`
Compatible with ARKit face tracking (52 blendshapes) and any system that outputs ARKit-style blend weights.
## Citation
```bibtex
@article{lam2025,
title={LAM: Large Avatar Model for One-Shot Animatable Gaussian Head Avatar},
author={Alibaba 3DAIGC Team},
journal={SIGGRAPH 2025},
year={2025}
}
```
## License
Apache-2.0 (same as the original LAM model).