File size: 6,029 Bytes
1406cbc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
license: other
license_name: mixed-upstream-see-card
library_name: mlx
pipeline_tag: text-to-image
tags:
  - instantid
  - sdxl
  - mlx
  - apple-silicon
  - face-id
  - controlnet
  - ip-adapter
  - identity-preservation
---

# SceneWorks/instantid-mlx

Converted weights for running **InstantID** identity-preserving SDXL **natively on Apple Silicon
with MLX** β€” zero Python at inference time. These are the three artifacts the
[`mlx-gen-instantid`](https://github.com/michaeltrefry/mlx-gen) provider loads to compose InstantID
out of the SDXL backbone + the native MLX face stack (`mlx-gen-face`).

This repo holds **only the InstantID-specific glue weights**. The SDXL base (e.g.
`SG161222/RealVisXL_V5.0` or `stabilityai/stable-diffusion-xl-base-1.0`), the IdentityNet
ControlNet (`InstantX/InstantID` β†’ `ControlNetModel/`), and the OpenPose ControlNet for pose mode
(`xinsir/controlnet-openpose-sdxl-1.0`) are loaded directly from their own diffusers repos β€” no
conversion needed for those.

## Files

| File | Size | What it is | Source | Converter |
|---|---|---|---|---|
| `ip-adapter.safetensors` | 1.57 GB | The InstantID face **IP-Adapter**: the image-projection **Resampler** (`image_proj.*`, ArcFace 512-d β†’ 16Γ—2048 face tokens) + the 70 decoupled cross-attention **K/V pairs** (`ip_adapter.*`). Re-serialized from the upstream torch **pickle** `ip-adapter.bin` into safetensors (MLX's loader reads safetensors, not pickle). | [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID) β†’ `ip-adapter.bin` | `tools/convert_instantid.py` |
| `scrfd_10g.safetensors` | 16 MB | **SCRFD** 5-point face detector (bbox + landmarks) β€” the detection half of the native face stack. Ported from the insightface `antelopev2` `scrfd_10g_bnkps` ONNX graph. | insightface `antelopev2` (`scrfd_10g_bnkps.onnx`) | `tools/convert_scrfd.py` |
| `arcface_iresnet100.safetensors` | 248 MB | **ArcFace** `iresnet100` 512-d recognition embedder β€” the identity-fidelity half. Ported from the insightface `antelopev2` `glintr100` ONNX graph. | insightface `antelopev2` (`glintr100.onnx`) | `tools/convert_glintr100.py` |

### Checksums (sha256)

```
fa5608b6121ffaa40228e76ac96e10f56e39b3aba2f6c4905ff7ef9046391c29  ip-adapter.safetensors
7b40147a85771139e70a8d9fe6be27ffcf32f4c911770ef24b5b05c29f534eda  scrfd_10g.safetensors
9deff2fef8fe1b3e357a99c01f28cc478dd8acbeab0d3749d252f6d69990ee39  arcface_iresnet100.safetensors
```

## Usage

### In `mlx-gen-instantid` (Rust / MLX)

```rust
use mlx_gen::weights::Weights;
use mlx_gen::WeightsSource;
use mlx_gen_instantid::{InstantId, InstantIdPaths, InstantIdRequest};

let model = InstantId::load(&InstantIdPaths {
    sdxl_base:   "/path/to/RealVisXL_V5.0".into(),          // diffusers SDXL snapshot
    identitynet: WeightsSource::Dir("/path/to/InstantX--InstantID/ControlNetModel".into()),
    ip_adapter:  "ip-adapter.safetensors".into(),           // <- from this repo
})?
.with_face(
    &Weights::from_file("scrfd_10g.safetensors")?,          // <- from this repo
    &Weights::from_file("arcface_iresnet100.safetensors")?, // <- from this repo
)?;

let out = model.generate(&InstantIdRequest { /* prompt, w/h, steps, guidance, scales, seed */ ..Default::default() }, &reference_image)?;
```

For pose mode add `.with_openpose(&WeightsSource::Dir("/path/to/xinsir--controlnet-openpose-sdxl-1.0".into()))?`
and call `generate_pose(req, &reference, &keypoints)`; for the ADetailer-style face-restore pass
call `restore_face(req, &base, &reference_embedding)`.

### In SceneWorks (download-on-first-use)

The SceneWorks Rust GPU worker fetches these three files from this repo on first use into its app
cache (mirroring the `SceneWorks/yolo11m-person-detect-mlx` and `SceneWorks/sam2-mlx` pattern). You
can pre-stage them with the env override `SCENEWORKS_INSTANTID_WEIGHTS=/dir/with/the/three/files`.

### Validation (real-weight, MLX, RealVisXL_V5.0 @ 1024Β²/30, fp16)

| Mode | Metric | Result |
|---|---|---|
| Single identity (`generate`) | ArcFace-cosine(ref, generated) | **0.8731** (torch baseline β‰ˆ 0.876) |
| Angle set (`generate_angle`, three-quarter right) | ArcFace-cosine | **0.8343** |
| Pose mode (`generate_pose`, full-body) | ArcFace-cosine | **0.7129** (small full-body face) |
| Face-restore (`restore_face`) | ArcFace-cosine | base 0.7370 β†’ **0.8338** |

## Reproducing the conversion

All three converters live in [`mlx-gen/tools/`](https://github.com/michaeltrefry/mlx-gen/tree/main/tools)
and run in a torch venv (torch + safetensors; insightface for SCRFD/ArcFace ONNX import):

```bash
python tools/convert_instantid.py    # InstantX/InstantID ip-adapter.bin  -> ip-adapter.safetensors
python tools/convert_scrfd.py        # antelopev2 scrfd_10g_bnkps.onnx     -> scrfd_10g.safetensors
python tools/convert_glintr100.py    # antelopev2 glintr100.onnx           -> arcface_iresnet100.safetensors
```

## Provenance & licensing

These are **format conversions** of third-party weights; the upstream licenses govern use. Verify
you comply with each before using them:

- **`ip-adapter.safetensors`** β€” derived from [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID)
  (Apache-2.0). InstantID research: *"InstantID: Zero-shot Identity-Preserving Generation in Seconds"*
  (Wang et al., 2024).
- **`scrfd_10g.safetensors`** and **`arcface_iresnet100.safetensors`** β€” derived from the
  [InsightFace](https://github.com/deepinsight/insightface) `antelopev2` model pack (`scrfd_10g_bnkps`
  + `glintr100`). **The InsightFace pretrained models are released for non-commercial research
  purposes only** β€” see the InsightFace repository for their terms. Do not use these two files in a
  commercial setting without securing appropriate rights from the upstream authors.

`license: other` reflects this mix; this card is the authoritative license statement. No additional
license is granted by the conversion. Conversions produced by the
[`mlx-gen`](https://github.com/michaeltrefry/mlx-gen) tooling (Apache-2.0 code).