SashimiSaketoro
/

PE-Core-ANE

Image Feature Extraction

PerceptionEncoder

apple-neural-engine

image-embedding

Model card Files Files and versions

PE-Core-ANE / README.md

SashimiSaketoro's picture

SashimiSaketoro

Upload README.md with huggingface_hub

d5983c2 verified 30 days ago

|

history blame contribute delete

2.24 kB

	---
	license: mit
	tags:
	- vision
	- coreml
	- apple-neural-engine
	- ane
	- perception-encoder
	- clip
	- image-embedding
	library_name: coremltools
	pipeline_tag: image-feature-extraction
	---

	# PE-Core ANE (Apple Neural Engine) Models

	Perception Encoder (PE-Core) models converted to CoreML format optimized for Apple Neural Engine (ANE).

	## Models

	\| Model \| Params \| Size \| Input \| Embedding \| Accuracy \|
	\|-------\|--------\|------\|-------\|-----------\|----------\|
	\| PE-Core-G14-448-ANE \| 2.4B \| 3.5GB \| 448x448 \| 1280 \| 1.0000 \|
	\| PE-Core-L-14-336-ANE \| 300M \| 604MB \| 336x336 \| 1024 \| 1.0000 \|
	\| PE-Core-B-16-ANE \| 86M \| 178MB \| 224x224 \| 768 \| 0.9998 \|
	\| PE-Core-S-16-384-ANE \| 22M \| 45MB \| 384x384 \| 384 \| 1.0000 \|
	\| PE-Core-T-16-384-ANE \| 6M \| 12MB \| 384x384 \| 192 \| 0.9999 \|

	## Performance (M3 Mac)

	\| Model \| ANE Latency \| MPS Latency \| Speedup \|
	\|-------\|-------------\|-------------\|---------\|
	\| PE-Core-bigG-14-448 \| 783ms \| 1049ms \| 1.34x \|
	\| PE-Core-L-14-336 \| ~180ms \| ~280ms \| ~1.5x \|
	\| PE-Core-B-16 \| ~50ms \| ~80ms \| ~1.6x \|

	## Usage (Python)

	```python
	import coremltools as ct
	import numpy as np

	# Load model
	model = ct.models.MLModel("PE-Core-B-16-ANE.mlpackage")

	# Prepare image (1, 3, 224, 224) normalized
	image = np.random.randn(1, 3, 224, 224).astype(np.float32)

	# Get embedding
	output = model.predict({"image": image})
	embedding = output["embedding"] # (1, 768)

	# Normalize for similarity search
	embedding = embedding / np.linalg.norm(embedding)
	```

	## Usage (Swift)

	```swift
	import CoreML

	let model = try MLModel(contentsOf: modelURL)
	let input = try MLDictionaryFeatureProvider(dictionary: ["image": pixelBuffer])
	let output = try model.prediction(from: input)
	let embedding = output.featureValue(for: "embedding")!.multiArrayValue!
	```

	## Conversion Details

	- Source: Meta's Perception Encoder via open_clip
	- Format: CoreML mlpackage (FP16)
	- Target: macOS 14+ (ANE optimized)
	- Accuracy: >99.98% cosine similarity vs PyTorch

	## Credits

	- Original models: [Meta AI Perception Encoder](https://github.com/facebookresearch/perception_models)
	- Loaded via: [open_clip](https://github.com/mlfoundations/open_clip)
	- Converted with: [coremltools](https://github.com/apple/coremltools)