msluszniak commited on
Commit
02810b5
Β·
verified Β·
1 Parent(s): cfee072

Initial upload of paraphrase-multilingual-MiniLM-L12-v2 exports

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ xnnpack/distiluse-base-multilingual-cased-v2_xnnpack_fp32.pte filter=lfs diff=lfs merge=lfs -text
37
+ xnnpack/distiluse-base-multilingual-cased-v2_xnnpack_8da4w.pte filter=lfs diff=lfs merge=lfs -text
38
+ coreml/distiluse-base-multilingual-cased-v2_coreml_fp16.pte filter=lfs diff=lfs merge=lfs -text
39
+ coreml/distiluse-base-multilingual-cased-v2_coreml_fp32.pte filter=lfs diff=lfs merge=lfs -text
40
+ coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte filter=lfs diff=lfs merge=lfs -text
41
+ coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte filter=lfs diff=lfs merge=lfs -text
42
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
+ xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte filter=lfs diff=lfs merge=lfs -text
44
+ xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Introduction
6
+
7
+ This repository hosts the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/tree/main) model for the [React Native ExecuTorch](https://www.npmjs.com/package/react-native-executorch) library. It includes the model exported for both the **XNNPACK** (Android / generic CPU) and **CoreML** (Apple) delegates, in multiple precisions, ready for use in the **ExecuTorch** runtime.
8
+
9
+ If you'd like to run these models in your own ExecuTorch runtime, refer to the [official documentation](https://pytorch.org/executorch/stable/index.html) for setup instructions.
10
+
11
+ ## Compatibility
12
+
13
+ If you intend to use this model outside of React Native ExecuTorch, make sure your runtime is compatible with the **ExecuTorch** version used to export the `.pte` files. For more details, see the compatibility note in the [ExecuTorch GitHub repository](https://github.com/pytorch/executorch/blob/11d1742fdeddcf05bc30a6cfac321d2a2e3b6768/runtime/COMPATIBILITY.md?plain=1#L4). If you work with React Native ExecuTorch, the constants from the library will guarantee compatibility with the runtime used behind the scenes.
14
+
15
+ These models were exported using React Native ExecuTorch `v0.9.0`, which ships an ExecuTorch runtime derived from the `v1.2.0` release branch and an updated `pytorch/extension/llm/tokenizers` build that adds Unigram / Precompiled normalizer / Metaspace decoder support β€” required to load this model's tokenizer. **No forward compatibility** is guaranteed β€” older versions of the runtime may not work with these files; in particular, RNE ≀ 0.8.x cannot load `tokenizer.json` and will fail at the tokenizer-load step.
16
+
17
+ ## Variant Matrix
18
+
19
+ | Delegate | Precision | File | Size | Notes |
20
+ |----------|-----------|-----------------------------------------------------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
21
+ | XNNPACK | fp32 | `xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte` | 449 MB | Baseline. Works on Android / iOS / generic CPU. |
22
+ | XNNPACK | 8da4w | `xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte` | 379 MB | Int8 dynamic activation + Int4 weight (torchao), group_size=32. Embeddings stay fp32 β€” the bulk of the file is the 250 037 Γ— 384 vocab matrix (β‰ˆ 384 MB), so the linear-layer quantization yields only a modest size win. |
23
+ | CoreML | fp32 | `coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte` | 449 MB | Apple Neural Engine / GPU / CPU, float32 compute. |
24
+ | CoreML | fp16 | `coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte` | 225 MB | Half-sized via `compute_precision=FLOAT16` at CoreML compile. Cleanest size win on iOS. |
25
+
26
+ Pick the variant that matches your platform + size/quality trade-off. The CoreML variants only load on Apple platforms; the XNNPACK variants load everywhere.
27
+
28
+ ## Repository Structure
29
+
30
+ - `xnnpack/` β€” `.pte` files partitioned for the XNNPACK delegate.
31
+ - `coreml/` β€” `.pte` files partitioned for the CoreML delegate (iOS / macOS only).
32
+ - `tokenizer.json` β€” HuggingFace fast-tokenizer dump (Unigram model + Precompiled normalizer + Metaspace decoder, derived from the upstream SentencePiece tokenizer). Wire this to `tokenizerSource`.
33
+ - `config.json`, `tokenizer_config.json` β€” upstream model/tokenizer configs, kept for reference and for non-RNE consumers.
34
+
35
+ The `.pte` path goes to `modelSource`; `tokenizer.json` is shared across all variants.
36
+
37
+ ## Model details
38
+
39
+ - Architecture: 12-layer, 12-head BERT with hidden size 384 (initialized from `xlm-roberta-base`) + mean pooling + L2 norm. No additional dense projection head β€” the model output dim equals the encoder hidden size.
40
+ - Output dimension: **384**.
41
+ - Max sequence length: **126** tokens (128 βˆ’ 2 for the `<s>` / `</s>` wrapping; the exporter concatenates these XLM-R-style start/end tokens at id 0 / 2 inside the program).
42
+ - Vocabulary: 250 037 SentencePiece pieces.
43
+ - Languages: 50+ (multilingual).
44
+ - Typical strength: cross-lingual sentence similarity and medium-length sentence retrieval β€” designed for paraphrase mining and cross-lingual search. Short single-word queries in non-English languages are this model's weakest case; longer sentences and/or English inputs give markedly better ranking.
45
+
46
+ ## Export notes
47
+
48
+ The exporter wraps the HuggingFace transformer with the standard sentence-transformers contract: token IDs go in, the program prepends `<s>` and appends `</s>`, mean pooling is applied to the last hidden state weighted by the attention mask, and the output is L2-normalized to a 384-d vector.
49
+
50
+ Unsupported combinations (rejected by the exporter, documented for reference):
51
+
52
+ - **XNNPACK + fp16** β€” `model.to(torch.float16)` causes softmax / LayerNorm overflow and the runtime output is NaN. XNNPACK's size wins come from quantization, not fp16.
53
+ - **CoreML + 8da4w** β€” `coremltools` has no MIL mapping for the `torch.int8` tensors torchao emits (`KeyError: torch.int8`). The CoreML-native way to shrink further is `ct.optimize.coreml` palette/linear quantization, not torchao source transforms.
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "old_models/paraphrase-multilingual-MiniLM-L12-v2/0_Transformer",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "transformers_version": "4.7.0",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 250037
24
+ }
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f5a00d98d653bbfd34e1a1ea0fbd4b1c7efcb00e9ad6035a671c23725352fe2
3
+ size 235546731
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb02c21323510e759fd6379d903ba734b044fd8249cf601c8754a0b2b1d643af
3
+ size 470468177
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": true, "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "tokenize_chinese_chars": true, "strip_accents": null, "bos_token": "<s>", "eos_token": "</s>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "old_models/paraphrase-multilingual-MiniLM-L12-v2/0_Transformer", "tokenizer_class": "PreTrainedTokenizerFast"}
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b5cbf80384160988c10e117575eb479b56cdc78eb3ee0cff40c02d00eb2a92c
3
+ size 397309568
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a53d2693f881857866eef57656224cc63f6668459bb32e21858e15a9e13c4b2
3
+ size 470274304