davidldahl's picture
Update README with 512 token model information
c417f72 verified

Contex.st Multilingual Embeddings

CoreML models for multilingual text embeddings in iOS apps.

Models

512 Token Versions (RECOMMENDED)

These models support the full 512 token context window for high-quality embeddings:

  • paraphrase-multilingual-MiniLM-L12-v2-512tokens.mlmodel - 384 dimensions, ~449 MB
  • distiluse-base-multilingual-cased-512tokens.mlmodel - 768 dimensions, ~514 MB

Legacy 32 Token Versions (NOT RECOMMENDED)

These models only support 32 tokens and will produce lower quality embeddings:

  • sentence_transformers_paraphrase_multilingual_MiniLM_L12_v2.mlmodel - 32 tokens only
  • sentence_transformers_distiluse_base_multilingual_cased.mlmodel - 32 tokens only

Usage

Use the 512 token versions for production. The 32 token versions are kept for backward compatibility only.

Source Models