|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- vision |
|
|
- coreml |
|
|
- apple-neural-engine |
|
|
- ane |
|
|
- perception-encoder |
|
|
- clip |
|
|
- image-embedding |
|
|
library_name: coremltools |
|
|
pipeline_tag: image-feature-extraction |
|
|
--- |
|
|
|
|
|
# PE-Core ANE (Apple Neural Engine) Models |
|
|
|
|
|
Perception Encoder (PE-Core) models converted to CoreML format optimized for Apple Neural Engine (ANE). |
|
|
|
|
|
## Models |
|
|
|
|
|
| Model | Params | Size | Input | Embedding | Accuracy | |
|
|
|-------|--------|------|-------|-----------|----------| |
|
|
| PE-Core-G14-448-ANE | 2.4B | 3.5GB | 448x448 | 1280 | 1.0000 | |
|
|
| PE-Core-L-14-336-ANE | 300M | 604MB | 336x336 | 1024 | 1.0000 | |
|
|
| PE-Core-B-16-ANE | 86M | 178MB | 224x224 | 768 | 0.9998 | |
|
|
| PE-Core-S-16-384-ANE | 22M | 45MB | 384x384 | 384 | 1.0000 | |
|
|
| PE-Core-T-16-384-ANE | 6M | 12MB | 384x384 | 192 | 0.9999 | |
|
|
|
|
|
## Performance (M3 Mac) |
|
|
|
|
|
| Model | ANE Latency | MPS Latency | Speedup | |
|
|
|-------|-------------|-------------|---------| |
|
|
| PE-Core-bigG-14-448 | 783ms | 1049ms | 1.34x | |
|
|
| PE-Core-L-14-336 | ~180ms | ~280ms | ~1.5x | |
|
|
| PE-Core-B-16 | ~50ms | ~80ms | ~1.6x | |
|
|
|
|
|
## Usage (Python) |
|
|
|
|
|
```python |
|
|
import coremltools as ct |
|
|
import numpy as np |
|
|
|
|
|
# Load model |
|
|
model = ct.models.MLModel("PE-Core-B-16-ANE.mlpackage") |
|
|
|
|
|
# Prepare image (1, 3, 224, 224) normalized |
|
|
image = np.random.randn(1, 3, 224, 224).astype(np.float32) |
|
|
|
|
|
# Get embedding |
|
|
output = model.predict({"image": image}) |
|
|
embedding = output["embedding"] # (1, 768) |
|
|
|
|
|
# Normalize for similarity search |
|
|
embedding = embedding / np.linalg.norm(embedding) |
|
|
``` |
|
|
|
|
|
## Usage (Swift) |
|
|
|
|
|
```swift |
|
|
import CoreML |
|
|
|
|
|
let model = try MLModel(contentsOf: modelURL) |
|
|
let input = try MLDictionaryFeatureProvider(dictionary: ["image": pixelBuffer]) |
|
|
let output = try model.prediction(from: input) |
|
|
let embedding = output.featureValue(for: "embedding")!.multiArrayValue! |
|
|
``` |
|
|
|
|
|
## Conversion Details |
|
|
|
|
|
- **Source**: Meta's Perception Encoder via open_clip |
|
|
- **Format**: CoreML mlpackage (FP16) |
|
|
- **Target**: macOS 14+ (ANE optimized) |
|
|
- **Accuracy**: >99.98% cosine similarity vs PyTorch |
|
|
|
|
|
## Credits |
|
|
|
|
|
- Original models: [Meta AI Perception Encoder](https://github.com/facebookresearch/perception_models) |
|
|
- Loaded via: [open_clip](https://github.com/mlfoundations/open_clip) |
|
|
- Converted with: [coremltools](https://github.com/apple/coremltools) |
|
|
|