mlboydaisuke
/

clip-vit-base-patch32-CoreAI-official

Model card Files Files and versions

clip-vit-base-patch32-CoreAI-official / README.md

mlboydaisuke's picture

Upload folder using huggingface_hub

5ea2781 verified 21 days ago

|

History Blame Contribute Delete

1.76 kB

	---
	license: mit
	tags:
	- coreai
	- clip
	- apple-silicon
	- on-device
	---

	# CLIP ViT-B/32 — Core AI export (official recipe)

	fp16 static export of [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
	via apple/coreai-models' official recipe (`models/clip/export.py`), with one change: text
	inputs are padded to the full 77-token context (`padding="max_length"`) so free-text
	queries work, instead of the recipe's 7-token example trace.

	Runs out of the box with [CoreAIKit](https://github.com/john-rocky/coreai-kit)'s
	`ImageTextEncoder`:

	```swift
	let encoder = try await ImageTextEncoder() // downloads this repo
	let imageVec = try await encoder.encode(image: cgImage)
	let textVec = try await encoder.encode(text: "red bike at the beach")
	let score = ImageTextEncoder.cosineSimilarity(imageVec, textVec)
	```

	## Bundle layout

	```
	model/
	├── clip-vit-base-patch32_float16_static.aimodel
	└── tokenizer.json
	```

	## Graph contract

	\| \| name \| shape \| dtype \|
	\|---\|---\|---\|---\|
	\| input \| `pixel_values` \| [1, 3, 224, 224] \| fp16 \|
	\| input \| `input_ids` \| [3, 77] \| int32 \|
	\| input \| `attention_mask` \| [3, 77] \| int32 \|
	\| output \| `image_embeds` \| [1, 512] \| fp16, L2-normalized \|
	\| output \| `text_embeds` \| [3, 512] \| fp16, L2-normalized \|
	\| output \| `logits_per_image` / `logits_per_text` \| [1, 3] / [3, 1] \| fp16 \|

	Preprocessing: 224×224 resize + CLIP mean/std normalization (handled by
	`ImageTextEncoder`).

	## Performance

	M4 Max: ~3.7 ms per image on the Neural Engine (fp16). Requires macOS 27 beta /
	iOS 27 beta (device — the CoreAI framework is not in the iOS Simulator SDK).

	## License

	Model weights: MIT (OpenAI CLIP); see the upstream repo. Export recipe:
	BSD-3-Clause (apple/coreai-models).