instantid-mlx / README.md
SceneWorks's picture
Add model card
1406cbc verified
metadata
license: other
license_name: mixed-upstream-see-card
library_name: mlx
pipeline_tag: text-to-image
tags:
  - instantid
  - sdxl
  - mlx
  - apple-silicon
  - face-id
  - controlnet
  - ip-adapter
  - identity-preservation

SceneWorks/instantid-mlx

Converted weights for running InstantID identity-preserving SDXL natively on Apple Silicon with MLX β€” zero Python at inference time. These are the three artifacts the mlx-gen-instantid provider loads to compose InstantID out of the SDXL backbone + the native MLX face stack (mlx-gen-face).

This repo holds only the InstantID-specific glue weights. The SDXL base (e.g. SG161222/RealVisXL_V5.0 or stabilityai/stable-diffusion-xl-base-1.0), the IdentityNet ControlNet (InstantX/InstantID β†’ ControlNetModel/), and the OpenPose ControlNet for pose mode (xinsir/controlnet-openpose-sdxl-1.0) are loaded directly from their own diffusers repos β€” no conversion needed for those.

Files

File Size What it is Source Converter
ip-adapter.safetensors 1.57 GB The InstantID face IP-Adapter: the image-projection Resampler (image_proj.*, ArcFace 512-d β†’ 16Γ—2048 face tokens) + the 70 decoupled cross-attention K/V pairs (ip_adapter.*). Re-serialized from the upstream torch pickle ip-adapter.bin into safetensors (MLX's loader reads safetensors, not pickle). InstantX/InstantID β†’ ip-adapter.bin tools/convert_instantid.py
scrfd_10g.safetensors 16 MB SCRFD 5-point face detector (bbox + landmarks) β€” the detection half of the native face stack. Ported from the insightface antelopev2 scrfd_10g_bnkps ONNX graph. insightface antelopev2 (scrfd_10g_bnkps.onnx) tools/convert_scrfd.py
arcface_iresnet100.safetensors 248 MB ArcFace iresnet100 512-d recognition embedder β€” the identity-fidelity half. Ported from the insightface antelopev2 glintr100 ONNX graph. insightface antelopev2 (glintr100.onnx) tools/convert_glintr100.py

Checksums (sha256)

fa5608b6121ffaa40228e76ac96e10f56e39b3aba2f6c4905ff7ef9046391c29  ip-adapter.safetensors
7b40147a85771139e70a8d9fe6be27ffcf32f4c911770ef24b5b05c29f534eda  scrfd_10g.safetensors
9deff2fef8fe1b3e357a99c01f28cc478dd8acbeab0d3749d252f6d69990ee39  arcface_iresnet100.safetensors

Usage

In mlx-gen-instantid (Rust / MLX)

use mlx_gen::weights::Weights;
use mlx_gen::WeightsSource;
use mlx_gen_instantid::{InstantId, InstantIdPaths, InstantIdRequest};

let model = InstantId::load(&InstantIdPaths {
    sdxl_base:   "/path/to/RealVisXL_V5.0".into(),          // diffusers SDXL snapshot
    identitynet: WeightsSource::Dir("/path/to/InstantX--InstantID/ControlNetModel".into()),
    ip_adapter:  "ip-adapter.safetensors".into(),           // <- from this repo
})?
.with_face(
    &Weights::from_file("scrfd_10g.safetensors")?,          // <- from this repo
    &Weights::from_file("arcface_iresnet100.safetensors")?, // <- from this repo
)?;

let out = model.generate(&InstantIdRequest { /* prompt, w/h, steps, guidance, scales, seed */ ..Default::default() }, &reference_image)?;

For pose mode add .with_openpose(&WeightsSource::Dir("/path/to/xinsir--controlnet-openpose-sdxl-1.0".into()))? and call generate_pose(req, &reference, &keypoints); for the ADetailer-style face-restore pass call restore_face(req, &base, &reference_embedding).

In SceneWorks (download-on-first-use)

The SceneWorks Rust GPU worker fetches these three files from this repo on first use into its app cache (mirroring the SceneWorks/yolo11m-person-detect-mlx and SceneWorks/sam2-mlx pattern). You can pre-stage them with the env override SCENEWORKS_INSTANTID_WEIGHTS=/dir/with/the/three/files.

Validation (real-weight, MLX, RealVisXL_V5.0 @ 1024Β²/30, fp16)

Mode Metric Result
Single identity (generate) ArcFace-cosine(ref, generated) 0.8731 (torch baseline β‰ˆ 0.876)
Angle set (generate_angle, three-quarter right) ArcFace-cosine 0.8343
Pose mode (generate_pose, full-body) ArcFace-cosine 0.7129 (small full-body face)
Face-restore (restore_face) ArcFace-cosine base 0.7370 β†’ 0.8338

Reproducing the conversion

All three converters live in mlx-gen/tools/ and run in a torch venv (torch + safetensors; insightface for SCRFD/ArcFace ONNX import):

python tools/convert_instantid.py    # InstantX/InstantID ip-adapter.bin  -> ip-adapter.safetensors
python tools/convert_scrfd.py        # antelopev2 scrfd_10g_bnkps.onnx     -> scrfd_10g.safetensors
python tools/convert_glintr100.py    # antelopev2 glintr100.onnx           -> arcface_iresnet100.safetensors

Provenance & licensing

These are format conversions of third-party weights; the upstream licenses govern use. Verify you comply with each before using them:

  • ip-adapter.safetensors β€” derived from InstantX/InstantID (Apache-2.0). InstantID research: "InstantID: Zero-shot Identity-Preserving Generation in Seconds" (Wang et al., 2024).
  • scrfd_10g.safetensors and arcface_iresnet100.safetensors β€” derived from the InsightFace antelopev2 model pack (scrfd_10g_bnkps
    • glintr100). The InsightFace pretrained models are released for non-commercial research purposes only β€” see the InsightFace repository for their terms. Do not use these two files in a commercial setting without securing appropriate rights from the upstream authors.

license: other reflects this mix; this card is the authoritative license statement. No additional license is granted by the conversion. Conversions produced by the mlx-gen tooling (Apache-2.0 code).