SceneWorks/instantid-mlx

Converted weights for running InstantID identity-preserving SDXL natively on Apple Silicon with MLX — zero Python at inference time. These are the three artifacts the mlx-gen-instantid provider loads to compose InstantID out of the SDXL backbone + the native MLX face stack (mlx-gen-face).

This repo holds only the InstantID-specific glue weights. The SDXL base (e.g. SG161222/RealVisXL_V5.0 or stabilityai/stable-diffusion-xl-base-1.0), the IdentityNet ControlNet (InstantX/InstantID → ControlNetModel/), and the OpenPose ControlNet for pose mode (xinsir/controlnet-openpose-sdxl-1.0) are loaded directly from their own diffusers repos — no conversion needed for those.

Files

File	Size	What it is	Source	Converter
`ip-adapter.safetensors`	1.57 GB	The InstantID face IP-Adapter: the image-projection Resampler (`image_proj.`, ArcFace 512-d → 16×2048 face tokens) + the 70 decoupled cross-attention K/V pairs* (`ip_adapter.`). Re-serialized from the upstream torch pickle* `ip-adapter.bin` into safetensors (MLX's loader reads safetensors, not pickle).	`InstantX/InstantID` → `ip-adapter.bin`	`tools/convert_instantid.py`
`scrfd_10g.safetensors`	16 MB	SCRFD 5-point face detector (bbox + landmarks) — the detection half of the native face stack. Ported from the insightface `antelopev2` `scrfd_10g_bnkps` ONNX graph.	insightface `antelopev2` (`scrfd_10g_bnkps.onnx`)	`tools/convert_scrfd.py`
`arcface_iresnet100.safetensors`	248 MB	ArcFace `iresnet100` 512-d recognition embedder — the identity-fidelity half. Ported from the insightface `antelopev2` `glintr100` ONNX graph.	insightface `antelopev2` (`glintr100.onnx`)	`tools/convert_glintr100.py`

Checksums (sha256)

fa5608b6121ffaa40228e76ac96e10f56e39b3aba2f6c4905ff7ef9046391c29  ip-adapter.safetensors
7b40147a85771139e70a8d9fe6be27ffcf32f4c911770ef24b5b05c29f534eda  scrfd_10g.safetensors
9deff2fef8fe1b3e357a99c01f28cc478dd8acbeab0d3749d252f6d69990ee39  arcface_iresnet100.safetensors

Usage

In `mlx-gen-instantid` (Rust / MLX)

use mlx_gen::weights::Weights;
use mlx_gen::WeightsSource;
use mlx_gen_instantid::{InstantId, InstantIdPaths, InstantIdRequest};

let model = InstantId::load(&InstantIdPaths {
    sdxl_base:   "/path/to/RealVisXL_V5.0".into(),          // diffusers SDXL snapshot
    identitynet: WeightsSource::Dir("/path/to/InstantX--InstantID/ControlNetModel".into()),
    ip_adapter:  "ip-adapter.safetensors".into(),           // <- from this repo
})?
.with_face(
    &Weights::from_file("scrfd_10g.safetensors")?,          // <- from this repo
    &Weights::from_file("arcface_iresnet100.safetensors")?, // <- from this repo
)?;

let out = model.generate(&InstantIdRequest { /* prompt, w/h, steps, guidance, scales, seed */ ..Default::default() }, &reference_image)?;

For pose mode add .with_openpose(&WeightsSource::Dir("/path/to/xinsir--controlnet-openpose-sdxl-1.0".into()))? and call generate_pose(req, &reference, &keypoints); for the ADetailer-style face-restore pass call restore_face(req, &base, &reference_embedding).

In SceneWorks (download-on-first-use)

The SceneWorks Rust GPU worker fetches these three files from this repo on first use into its app cache (mirroring the SceneWorks/yolo11m-person-detect-mlx and SceneWorks/sam2-mlx pattern). You can pre-stage them with the env override SCENEWORKS_INSTANTID_WEIGHTS=/dir/with/the/three/files.

Validation (real-weight, MLX, RealVisXL_V5.0 @ 1024²/30, fp16)

Mode	Metric	Result
Single identity (`generate`)	ArcFace-cosine(ref, generated)	0.8731 (torch baseline ≈ 0.876)
Angle set (`generate_angle`, three-quarter right)	ArcFace-cosine	0.8343
Pose mode (`generate_pose`, full-body)	ArcFace-cosine	0.7129 (small full-body face)
Face-restore (`restore_face`)	ArcFace-cosine	base 0.7370 → 0.8338

Reproducing the conversion

All three converters live in mlx-gen/tools/ and run in a torch venv (torch + safetensors; insightface for SCRFD/ArcFace ONNX import):

python tools/convert_instantid.py    # InstantX/InstantID ip-adapter.bin  -> ip-adapter.safetensors
python tools/convert_scrfd.py        # antelopev2 scrfd_10g_bnkps.onnx     -> scrfd_10g.safetensors
python tools/convert_glintr100.py    # antelopev2 glintr100.onnx           -> arcface_iresnet100.safetensors

Provenance & licensing

These are format conversions of third-party weights; the upstream licenses govern use. Verify you comply with each before using them:

ip-adapter.safetensors — derived from InstantX/InstantID (Apache-2.0). InstantID research: "InstantID: Zero-shot Identity-Preserving Generation in Seconds" (Wang et al., 2024).
scrfd_10g.safetensors and arcface_iresnet100.safetensors — derived from the InsightFace antelopev2 model pack (scrfd_10g_bnkps
- glintr100). The InsightFace pretrained models are released for non-commercial research purposes only — see the InsightFace repository for their terms. Do not use these two files in a commercial setting without securing appropriate rights from the upstream authors.

license: other reflects this mix; this card is the authoritative license statement. No additional license is granted by the conversion. Conversions produced by the mlx-gen tooling (Apache-2.0 code).

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized