Initial upload of paraphrase-multilingual-MiniLM-L12-v2 exports
Browse files- .gitattributes +9 -0
- README.md +53 -0
- config.json +24 -0
- coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte +3 -0
- coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +1 -0
- xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte +3 -0
- xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
xnnpack/distiluse-base-multilingual-cased-v2_xnnpack_fp32.pte filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
xnnpack/distiluse-base-multilingual-cased-v2_xnnpack_8da4w.pte filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
coreml/distiluse-base-multilingual-cased-v2_coreml_fp16.pte filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
coreml/distiluse-base-multilingual-cased-v2_coreml_fp32.pte filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Introduction
|
| 6 |
+
|
| 7 |
+
This repository hosts the [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2/tree/main) model for the [React Native ExecuTorch](https://www.npmjs.com/package/react-native-executorch) library. It includes the model exported for both the **XNNPACK** (Android / generic CPU) and **CoreML** (Apple) delegates, in multiple precisions, ready for use in the **ExecuTorch** runtime.
|
| 8 |
+
|
| 9 |
+
If you'd like to run these models in your own ExecuTorch runtime, refer to the [official documentation](https://pytorch.org/executorch/stable/index.html) for setup instructions.
|
| 10 |
+
|
| 11 |
+
## Compatibility
|
| 12 |
+
|
| 13 |
+
If you intend to use this model outside of React Native ExecuTorch, make sure your runtime is compatible with the **ExecuTorch** version used to export the `.pte` files. For more details, see the compatibility note in the [ExecuTorch GitHub repository](https://github.com/pytorch/executorch/blob/11d1742fdeddcf05bc30a6cfac321d2a2e3b6768/runtime/COMPATIBILITY.md?plain=1#L4). If you work with React Native ExecuTorch, the constants from the library will guarantee compatibility with the runtime used behind the scenes.
|
| 14 |
+
|
| 15 |
+
These models were exported using React Native ExecuTorch `v0.9.0`, which ships an ExecuTorch runtime derived from the `v1.2.0` release branch and an updated `pytorch/extension/llm/tokenizers` build that adds Unigram / Precompiled normalizer / Metaspace decoder support β required to load this model's tokenizer. **No forward compatibility** is guaranteed β older versions of the runtime may not work with these files; in particular, RNE β€ 0.8.x cannot load `tokenizer.json` and will fail at the tokenizer-load step.
|
| 16 |
+
|
| 17 |
+
## Variant Matrix
|
| 18 |
+
|
| 19 |
+
| Delegate | Precision | File | Size | Notes |
|
| 20 |
+
|----------|-----------|-----------------------------------------------------------------------------------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 21 |
+
| XNNPACK | fp32 | `xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte` | 449 MB | Baseline. Works on Android / iOS / generic CPU. |
|
| 22 |
+
| XNNPACK | 8da4w | `xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte` | 379 MB | Int8 dynamic activation + Int4 weight (torchao), group_size=32. Embeddings stay fp32 β the bulk of the file is the 250 037 Γ 384 vocab matrix (β 384 MB), so the linear-layer quantization yields only a modest size win. |
|
| 23 |
+
| CoreML | fp32 | `coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte` | 449 MB | Apple Neural Engine / GPU / CPU, float32 compute. |
|
| 24 |
+
| CoreML | fp16 | `coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte` | 225 MB | Half-sized via `compute_precision=FLOAT16` at CoreML compile. Cleanest size win on iOS. |
|
| 25 |
+
|
| 26 |
+
Pick the variant that matches your platform + size/quality trade-off. The CoreML variants only load on Apple platforms; the XNNPACK variants load everywhere.
|
| 27 |
+
|
| 28 |
+
## Repository Structure
|
| 29 |
+
|
| 30 |
+
- `xnnpack/` β `.pte` files partitioned for the XNNPACK delegate.
|
| 31 |
+
- `coreml/` β `.pte` files partitioned for the CoreML delegate (iOS / macOS only).
|
| 32 |
+
- `tokenizer.json` β HuggingFace fast-tokenizer dump (Unigram model + Precompiled normalizer + Metaspace decoder, derived from the upstream SentencePiece tokenizer). Wire this to `tokenizerSource`.
|
| 33 |
+
- `config.json`, `tokenizer_config.json` β upstream model/tokenizer configs, kept for reference and for non-RNE consumers.
|
| 34 |
+
|
| 35 |
+
The `.pte` path goes to `modelSource`; `tokenizer.json` is shared across all variants.
|
| 36 |
+
|
| 37 |
+
## Model details
|
| 38 |
+
|
| 39 |
+
- Architecture: 12-layer, 12-head BERT with hidden size 384 (initialized from `xlm-roberta-base`) + mean pooling + L2 norm. No additional dense projection head β the model output dim equals the encoder hidden size.
|
| 40 |
+
- Output dimension: **384**.
|
| 41 |
+
- Max sequence length: **126** tokens (128 β 2 for the `<s>` / `</s>` wrapping; the exporter concatenates these XLM-R-style start/end tokens at id 0 / 2 inside the program).
|
| 42 |
+
- Vocabulary: 250 037 SentencePiece pieces.
|
| 43 |
+
- Languages: 50+ (multilingual).
|
| 44 |
+
- Typical strength: cross-lingual sentence similarity and medium-length sentence retrieval β designed for paraphrase mining and cross-lingual search. Short single-word queries in non-English languages are this model's weakest case; longer sentences and/or English inputs give markedly better ranking.
|
| 45 |
+
|
| 46 |
+
## Export notes
|
| 47 |
+
|
| 48 |
+
The exporter wraps the HuggingFace transformer with the standard sentence-transformers contract: token IDs go in, the program prepends `<s>` and appends `</s>`, mean pooling is applied to the last hidden state weighted by the attention mask, and the output is L2-normalized to a 384-d vector.
|
| 49 |
+
|
| 50 |
+
Unsupported combinations (rejected by the exporter, documented for reference):
|
| 51 |
+
|
| 52 |
+
- **XNNPACK + fp16** β `model.to(torch.float16)` causes softmax / LayerNorm overflow and the runtime output is NaN. XNNPACK's size wins come from quantization, not fp16.
|
| 53 |
+
- **CoreML + 8da4w** β `coremltools` has no MIL mapping for the `torch.int8` tensors torchao emits (`KeyError: torch.int8`). The CoreML-native way to shrink further is `ct.optimize.coreml` palette/linear quantization, not torchao source transforms.
|
config.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "old_models/paraphrase-multilingual-MiniLM-L12-v2/0_Transformer",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"BertModel"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.1,
|
| 7 |
+
"gradient_checkpointing": false,
|
| 8 |
+
"hidden_act": "gelu",
|
| 9 |
+
"hidden_dropout_prob": 0.1,
|
| 10 |
+
"hidden_size": 384,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"intermediate_size": 1536,
|
| 13 |
+
"layer_norm_eps": 1e-12,
|
| 14 |
+
"max_position_embeddings": 512,
|
| 15 |
+
"model_type": "bert",
|
| 16 |
+
"num_attention_heads": 12,
|
| 17 |
+
"num_hidden_layers": 12,
|
| 18 |
+
"pad_token_id": 0,
|
| 19 |
+
"position_embedding_type": "absolute",
|
| 20 |
+
"transformers_version": "4.7.0",
|
| 21 |
+
"type_vocab_size": 2,
|
| 22 |
+
"use_cache": true,
|
| 23 |
+
"vocab_size": 250037
|
| 24 |
+
}
|
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp16.pte
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8f5a00d98d653bbfd34e1a1ea0fbd4b1c7efcb00e9ad6035a671c23725352fe2
|
| 3 |
+
size 235546731
|
coreml/paraphrase-multilingual-MiniLM-L12-v2_coreml_fp32.pte
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cb02c21323510e759fd6379d903ba734b044fd8249cf601c8754a0b2b1d643af
|
| 3 |
+
size 470468177
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
|
| 3 |
+
size 17082987
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"do_lower_case": true, "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "tokenize_chinese_chars": true, "strip_accents": null, "bos_token": "<s>", "eos_token": "</s>", "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "old_models/paraphrase-multilingual-MiniLM-L12-v2/0_Transformer", "tokenizer_class": "PreTrainedTokenizerFast"}
|
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_8da4w.pte
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0b5cbf80384160988c10e117575eb479b56cdc78eb3ee0cff40c02d00eb2a92c
|
| 3 |
+
size 397309568
|
xnnpack/paraphrase-multilingual-MiniLM-L12-v2_xnnpack_fp32.pte
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9a53d2693f881857866eef57656224cc63f6668459bb32e21858e15a9e13c4b2
|
| 3 |
+
size 470274304
|