memeticresearch
/

contexst-multilingual-embeddings

Core ML

Model card Files Files and versions

xet

Community

davidldahl commited on Jun 9, 2025

Commit

c417f72

verified ·

1 Parent(s): 4fd9bfd

Update README with 512 token model information

Browse files

Files changed (1) hide show

README.md +15 -98

README.md CHANGED Viewed

@@ -1,109 +1,26 @@
----
-license: apache-2.0
-tags:
-- coreml
-- sentence-embeddings
-- multilingual
-- ios
-- macos
-- sentence-transformers
-language:
-- multilingual
-- en
-- de
-- fr
-- es
-- it
-- pt
-- nl
-- pl
-- ru
-- zh
-- ja
-- ko
-- ar
-- tr
-library_name: coreml
-pipeline_tag: sentence-similarity
----
-# Contex.st Multilingual Embeddings (CoreML)
-This repository contains CoreML-converted versions of popular multilingual sentence embedding models for use in iOS and macOS applications.
 ## Models
-### 1. Paraphrase Multilingual MiniLM L12 v2
-- **Original model**: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
-- **File**: `sentence_transformers_paraphrase_multilingual_MiniLM_L12_v2.mlmodel`
-- **Size**: 447.6 MB
-- **Dimensions**: 384
-- **Languages**: 50+ languages
-### 2. DistilUSE Base Multilingual Cased
-- **Original model**: [sentence-transformers/distiluse-base-multilingual-cased](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased)
-- **File**: `sentence_transformers_distiluse_base_multilingual_cased.mlmodel`
-- **Size**: 512.8 MB
-- **Dimensions**: 512
-- **Languages**: 15 languages
-## Usage
-These models are designed for use in the [Contex.st](https://contex.st) iOS app but can be used in any iOS/macOS application that supports CoreML.
-### iOS/macOS Integration
-```swift
-import CoreML
-// Load the model
-let modelURL = // Path to downloaded .mlmodel file
-let model = try MLModel(contentsOf: modelURL)
-// Prepare input
-let input = // Tokenized text as MLMultiArray
-// Get embeddings
-let output = try model.prediction(from: input)
-```
-## Model Details
-### Conversion Process
-These models were converted from PyTorch to CoreML format using:
-- Python 3.13
-- PyTorch 2.7.1
-- CoreMLTools
-- Sentence Transformers
-The conversion maintains the original model architecture while optimizing for Apple devices.
-### Performance
-- Optimized for Apple Neural Engine (ANE)
-- Support for CPU fallback
-- Batch processing capable
-- Real-time inference on modern iOS devices
-## License
-These converted models maintain the original Apache 2.0 license from the source models.
-## Citation
-If you use these models, please cite the original sentence-transformers work:
-```bibtex
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-}
-```
-## Contact
-For issues or questions about these CoreML conversions, please open an issue in this repository.

+# Contex.st Multilingual Embeddings
+CoreML models for multilingual text embeddings in iOS apps.
 ## Models
+### 512 Token Versions (RECOMMENDED)
+These models support the full 512 token context window for high-quality embeddings:
+- `paraphrase-multilingual-MiniLM-L12-v2-512tokens.mlmodel` - 384 dimensions, ~449 MB
+- `distiluse-base-multilingual-cased-512tokens.mlmodel` - 768 dimensions, ~514 MB
+### Legacy 32 Token Versions (NOT RECOMMENDED)
+These models only support 32 tokens and will produce lower quality embeddings:
+- `sentence_transformers_paraphrase_multilingual_MiniLM_L12_v2.mlmodel` - 32 tokens only
+- `sentence_transformers_distiluse_base_multilingual_cased.mlmodel` - 32 tokens only
+## Usage
+Use the 512 token versions for production. The 32 token versions are kept for backward compatibility only.
+## Source Models
+- [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
+- [sentence-transformers/distiluse-base-multilingual-cased](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased)