--- license: other license_name: mixed-upstream-see-card library_name: mlx pipeline_tag: text-to-image tags: - instantid - sdxl - mlx - apple-silicon - face-id - controlnet - ip-adapter - identity-preservation --- # SceneWorks/instantid-mlx Converted weights for running **InstantID** identity-preserving SDXL **natively on Apple Silicon with MLX** — zero Python at inference time. These are the three artifacts the [`mlx-gen-instantid`](https://github.com/michaeltrefry/mlx-gen) provider loads to compose InstantID out of the SDXL backbone + the native MLX face stack (`mlx-gen-face`). This repo holds **only the InstantID-specific glue weights**. The SDXL base (e.g. `SG161222/RealVisXL_V5.0` or `stabilityai/stable-diffusion-xl-base-1.0`), the IdentityNet ControlNet (`InstantX/InstantID` → `ControlNetModel/`), and the OpenPose ControlNet for pose mode (`xinsir/controlnet-openpose-sdxl-1.0`) are loaded directly from their own diffusers repos — no conversion needed for those. ## Files | File | Size | What it is | Source | Converter | |---|---|---|---|---| | `ip-adapter.safetensors` | 1.57 GB | The InstantID face **IP-Adapter**: the image-projection **Resampler** (`image_proj.*`, ArcFace 512-d → 16×2048 face tokens) + the 70 decoupled cross-attention **K/V pairs** (`ip_adapter.*`). Re-serialized from the upstream torch **pickle** `ip-adapter.bin` into safetensors (MLX's loader reads safetensors, not pickle). | [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID) → `ip-adapter.bin` | `tools/convert_instantid.py` | | `scrfd_10g.safetensors` | 16 MB | **SCRFD** 5-point face detector (bbox + landmarks) — the detection half of the native face stack. Ported from the insightface `antelopev2` `scrfd_10g_bnkps` ONNX graph. | insightface `antelopev2` (`scrfd_10g_bnkps.onnx`) | `tools/convert_scrfd.py` | | `arcface_iresnet100.safetensors` | 248 MB | **ArcFace** `iresnet100` 512-d recognition embedder — the identity-fidelity half. Ported from the insightface `antelopev2` `glintr100` ONNX graph. | insightface `antelopev2` (`glintr100.onnx`) | `tools/convert_glintr100.py` | ### Checksums (sha256) ``` fa5608b6121ffaa40228e76ac96e10f56e39b3aba2f6c4905ff7ef9046391c29 ip-adapter.safetensors 7b40147a85771139e70a8d9fe6be27ffcf32f4c911770ef24b5b05c29f534eda scrfd_10g.safetensors 9deff2fef8fe1b3e357a99c01f28cc478dd8acbeab0d3749d252f6d69990ee39 arcface_iresnet100.safetensors ``` ## Usage ### In `mlx-gen-instantid` (Rust / MLX) ```rust use mlx_gen::weights::Weights; use mlx_gen::WeightsSource; use mlx_gen_instantid::{InstantId, InstantIdPaths, InstantIdRequest}; let model = InstantId::load(&InstantIdPaths { sdxl_base: "/path/to/RealVisXL_V5.0".into(), // diffusers SDXL snapshot identitynet: WeightsSource::Dir("/path/to/InstantX--InstantID/ControlNetModel".into()), ip_adapter: "ip-adapter.safetensors".into(), // <- from this repo })? .with_face( &Weights::from_file("scrfd_10g.safetensors")?, // <- from this repo &Weights::from_file("arcface_iresnet100.safetensors")?, // <- from this repo )?; let out = model.generate(&InstantIdRequest { /* prompt, w/h, steps, guidance, scales, seed */ ..Default::default() }, &reference_image)?; ``` For pose mode add `.with_openpose(&WeightsSource::Dir("/path/to/xinsir--controlnet-openpose-sdxl-1.0".into()))?` and call `generate_pose(req, &reference, &keypoints)`; for the ADetailer-style face-restore pass call `restore_face(req, &base, &reference_embedding)`. ### In SceneWorks (download-on-first-use) The SceneWorks Rust GPU worker fetches these three files from this repo on first use into its app cache (mirroring the `SceneWorks/yolo11m-person-detect-mlx` and `SceneWorks/sam2-mlx` pattern). You can pre-stage them with the env override `SCENEWORKS_INSTANTID_WEIGHTS=/dir/with/the/three/files`. ### Validation (real-weight, MLX, RealVisXL_V5.0 @ 1024²/30, fp16) | Mode | Metric | Result | |---|---|---| | Single identity (`generate`) | ArcFace-cosine(ref, generated) | **0.8731** (torch baseline ≈ 0.876) | | Angle set (`generate_angle`, three-quarter right) | ArcFace-cosine | **0.8343** | | Pose mode (`generate_pose`, full-body) | ArcFace-cosine | **0.7129** (small full-body face) | | Face-restore (`restore_face`) | ArcFace-cosine | base 0.7370 → **0.8338** | ## Reproducing the conversion All three converters live in [`mlx-gen/tools/`](https://github.com/michaeltrefry/mlx-gen/tree/main/tools) and run in a torch venv (torch + safetensors; insightface for SCRFD/ArcFace ONNX import): ```bash python tools/convert_instantid.py # InstantX/InstantID ip-adapter.bin -> ip-adapter.safetensors python tools/convert_scrfd.py # antelopev2 scrfd_10g_bnkps.onnx -> scrfd_10g.safetensors python tools/convert_glintr100.py # antelopev2 glintr100.onnx -> arcface_iresnet100.safetensors ``` ## Provenance & licensing These are **format conversions** of third-party weights; the upstream licenses govern use. Verify you comply with each before using them: - **`ip-adapter.safetensors`** — derived from [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID) (Apache-2.0). InstantID research: *"InstantID: Zero-shot Identity-Preserving Generation in Seconds"* (Wang et al., 2024). - **`scrfd_10g.safetensors`** and **`arcface_iresnet100.safetensors`** — derived from the [InsightFace](https://github.com/deepinsight/insightface) `antelopev2` model pack (`scrfd_10g_bnkps` + `glintr100`). **The InsightFace pretrained models are released for non-commercial research purposes only** — see the InsightFace repository for their terms. Do not use these two files in a commercial setting without securing appropriate rights from the upstream authors. `license: other` reflects this mix; this card is the authoritative license statement. No additional license is granted by the conversion. Conversions produced by the [`mlx-gen`](https://github.com/michaeltrefry/mlx-gen) tooling (Apache-2.0 code).