Instructions to use SceneWorks/instantid-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use SceneWorks/instantid-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir instantid-mlx SceneWorks/instantid-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| license: other | |
| license_name: mixed-upstream-see-card | |
| library_name: mlx | |
| pipeline_tag: text-to-image | |
| tags: | |
| - instantid | |
| - sdxl | |
| - mlx | |
| - apple-silicon | |
| - face-id | |
| - controlnet | |
| - ip-adapter | |
| - identity-preservation | |
| # SceneWorks/instantid-mlx | |
| Converted weights for running **InstantID** identity-preserving SDXL **natively on Apple Silicon | |
| with MLX** β zero Python at inference time. These are the three artifacts the | |
| [`mlx-gen-instantid`](https://github.com/michaeltrefry/mlx-gen) provider loads to compose InstantID | |
| out of the SDXL backbone + the native MLX face stack (`mlx-gen-face`). | |
| This repo holds **only the InstantID-specific glue weights**. The SDXL base (e.g. | |
| `SG161222/RealVisXL_V5.0` or `stabilityai/stable-diffusion-xl-base-1.0`), the IdentityNet | |
| ControlNet (`InstantX/InstantID` β `ControlNetModel/`), and the OpenPose ControlNet for pose mode | |
| (`xinsir/controlnet-openpose-sdxl-1.0`) are loaded directly from their own diffusers repos β no | |
| conversion needed for those. | |
| ## Files | |
| | File | Size | What it is | Source | Converter | | |
| |---|---|---|---|---| | |
| | `ip-adapter.safetensors` | 1.57 GB | The InstantID face **IP-Adapter**: the image-projection **Resampler** (`image_proj.*`, ArcFace 512-d β 16Γ2048 face tokens) + the 70 decoupled cross-attention **K/V pairs** (`ip_adapter.*`). Re-serialized from the upstream torch **pickle** `ip-adapter.bin` into safetensors (MLX's loader reads safetensors, not pickle). | [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID) β `ip-adapter.bin` | `tools/convert_instantid.py` | | |
| | `scrfd_10g.safetensors` | 16 MB | **SCRFD** 5-point face detector (bbox + landmarks) β the detection half of the native face stack. Ported from the insightface `antelopev2` `scrfd_10g_bnkps` ONNX graph. | insightface `antelopev2` (`scrfd_10g_bnkps.onnx`) | `tools/convert_scrfd.py` | | |
| | `arcface_iresnet100.safetensors` | 248 MB | **ArcFace** `iresnet100` 512-d recognition embedder β the identity-fidelity half. Ported from the insightface `antelopev2` `glintr100` ONNX graph. | insightface `antelopev2` (`glintr100.onnx`) | `tools/convert_glintr100.py` | | |
| ### Checksums (sha256) | |
| ``` | |
| fa5608b6121ffaa40228e76ac96e10f56e39b3aba2f6c4905ff7ef9046391c29 ip-adapter.safetensors | |
| 7b40147a85771139e70a8d9fe6be27ffcf32f4c911770ef24b5b05c29f534eda scrfd_10g.safetensors | |
| 9deff2fef8fe1b3e357a99c01f28cc478dd8acbeab0d3749d252f6d69990ee39 arcface_iresnet100.safetensors | |
| ``` | |
| ## Usage | |
| ### In `mlx-gen-instantid` (Rust / MLX) | |
| ```rust | |
| use mlx_gen::weights::Weights; | |
| use mlx_gen::WeightsSource; | |
| use mlx_gen_instantid::{InstantId, InstantIdPaths, InstantIdRequest}; | |
| let model = InstantId::load(&InstantIdPaths { | |
| sdxl_base: "/path/to/RealVisXL_V5.0".into(), // diffusers SDXL snapshot | |
| identitynet: WeightsSource::Dir("/path/to/InstantX--InstantID/ControlNetModel".into()), | |
| ip_adapter: "ip-adapter.safetensors".into(), // <- from this repo | |
| })? | |
| .with_face( | |
| &Weights::from_file("scrfd_10g.safetensors")?, // <- from this repo | |
| &Weights::from_file("arcface_iresnet100.safetensors")?, // <- from this repo | |
| )?; | |
| let out = model.generate(&InstantIdRequest { /* prompt, w/h, steps, guidance, scales, seed */ ..Default::default() }, &reference_image)?; | |
| ``` | |
| For pose mode add `.with_openpose(&WeightsSource::Dir("/path/to/xinsir--controlnet-openpose-sdxl-1.0".into()))?` | |
| and call `generate_pose(req, &reference, &keypoints)`; for the ADetailer-style face-restore pass | |
| call `restore_face(req, &base, &reference_embedding)`. | |
| ### In SceneWorks (download-on-first-use) | |
| The SceneWorks Rust GPU worker fetches these three files from this repo on first use into its app | |
| cache (mirroring the `SceneWorks/yolo11m-person-detect-mlx` and `SceneWorks/sam2-mlx` pattern). You | |
| can pre-stage them with the env override `SCENEWORKS_INSTANTID_WEIGHTS=/dir/with/the/three/files`. | |
| ### Validation (real-weight, MLX, RealVisXL_V5.0 @ 1024Β²/30, fp16) | |
| | Mode | Metric | Result | | |
| |---|---|---| | |
| | Single identity (`generate`) | ArcFace-cosine(ref, generated) | **0.8731** (torch baseline β 0.876) | | |
| | Angle set (`generate_angle`, three-quarter right) | ArcFace-cosine | **0.8343** | | |
| | Pose mode (`generate_pose`, full-body) | ArcFace-cosine | **0.7129** (small full-body face) | | |
| | Face-restore (`restore_face`) | ArcFace-cosine | base 0.7370 β **0.8338** | | |
| ## Reproducing the conversion | |
| All three converters live in [`mlx-gen/tools/`](https://github.com/michaeltrefry/mlx-gen/tree/main/tools) | |
| and run in a torch venv (torch + safetensors; insightface for SCRFD/ArcFace ONNX import): | |
| ```bash | |
| python tools/convert_instantid.py # InstantX/InstantID ip-adapter.bin -> ip-adapter.safetensors | |
| python tools/convert_scrfd.py # antelopev2 scrfd_10g_bnkps.onnx -> scrfd_10g.safetensors | |
| python tools/convert_glintr100.py # antelopev2 glintr100.onnx -> arcface_iresnet100.safetensors | |
| ``` | |
| ## Provenance & licensing | |
| These are **format conversions** of third-party weights; the upstream licenses govern use. Verify | |
| you comply with each before using them: | |
| - **`ip-adapter.safetensors`** β derived from [`InstantX/InstantID`](https://huggingface.co/InstantX/InstantID) | |
| (Apache-2.0). InstantID research: *"InstantID: Zero-shot Identity-Preserving Generation in Seconds"* | |
| (Wang et al., 2024). | |
| - **`scrfd_10g.safetensors`** and **`arcface_iresnet100.safetensors`** β derived from the | |
| [InsightFace](https://github.com/deepinsight/insightface) `antelopev2` model pack (`scrfd_10g_bnkps` | |
| + `glintr100`). **The InsightFace pretrained models are released for non-commercial research | |
| purposes only** β see the InsightFace repository for their terms. Do not use these two files in a | |
| commercial setting without securing appropriate rights from the upstream authors. | |
| `license: other` reflects this mix; this card is the authoritative license statement. No additional | |
| license is granted by the conversion. Conversions produced by the | |
| [`mlx-gen`](https://github.com/michaeltrefry/mlx-gen) tooling (Apache-2.0 code). | |