|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- coreml |
|
|
- apple-silicon |
|
|
- embeddings |
|
|
- sentence-transformers |
|
|
library_name: coremltools |
|
|
base_model: MindscapeRAG/MiA-Emb-4B |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# MiA Emb 4B CoreML |
|
|
|
|
|
MiA-Emb-4B converted to CoreML for Apple Silicon (M1/M2/M3/M4) |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Format**: CoreML ML Program (`.mlpackage`) |
|
|
- **Precision**: FP16 |
|
|
- **Input**: `input_ids` (1, 512), `attention_mask` (1, 512) |
|
|
- **Output**: `embeddings` (1, 3584) |
|
|
- **Target**: macOS 14+ / iOS 17+ / Apple Silicon |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import coremltools as ct |
|
|
import numpy as np |
|
|
|
|
|
# Load model |
|
|
model = ct.models.MLModel("coreml_fp16.mlpackage") |
|
|
|
|
|
# Prepare inputs (use your tokenizer) |
|
|
input_ids = np.zeros((1, 512), dtype=np.int32) |
|
|
attention_mask = np.ones((1, 512), dtype=np.int32) |
|
|
|
|
|
# Run inference |
|
|
output = model.predict({"input_ids": input_ids, "attention_mask": attention_mask}) |
|
|
embeddings = output["embeddings"] # Shape: (1, 3584) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
Benchmarked on Apple M4: |
|
|
- **Inference**: ~80-100ms per embedding |
|
|
- **Load time**: ~13s (first load, cached after) |
|
|
|
|
|
## Conversion |
|
|
|
|
|
Converted using coremltools 8.1 with custom op handlers for: |
|
|
- `new_ones` (GitHub issue #2040) |
|
|
- Bitwise ops (`and`, `or`) with int→bool casting |
|
|
|
|
|
## License |
|
|
|
|
|
Same license as base model: MindscapeRAG/MiA-Emb-4B |
|
|
|