Initial commit - transformed model to onnx format

Files changed (12) hide show

.gitattributes +2 -0
.gitignore +12 -0
README.md +169 -0
onnx/berta-onnx/BERTA.onnx +3 -0
onnx/berta-onnx/BERTA.onnx.data +3 -0
onnx/berta-onnx/special_tokens_map.json +37 -0
onnx/berta-onnx/tokenizer.json +0 -0
onnx/berta-onnx/tokenizer_config.json +66 -0
onnx/berta-onnx/vocab.txt +0 -0
pyproject.toml +28 -0
safetensors_to_onnx.ipynb +380 -0
safetensors_to_onnx.py +136 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+onnx/berta-onnx/BERTA.onnx.data filter=lfs diff=lfs merge=lfs -text
+onnx/berta-onnx/BERTA.onnx filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv
+.idea
+uv.lock

README.md CHANGED Viewed

@@ -1,3 +1,172 @@
 ---
 license: mit
 ---

 ---
+language:
+  - ru
+  - en
+pipeline_tag: sentence-similarity
+tags:
+  - russian
+  - pretraining
+  - embeddings
+  - feature-extraction
+  - sentence-similarity
+  - sentence-transformers
+  - transformers
+datasets:
+  - IlyaGusev/gazeta
+  - zloelias/lenta-ru
+  - HuggingFaceFW/fineweb-2
+  - HuggingFaceFW/fineweb
 license: mit
+base_model: sergeyzh/LaBSE-ru-turbo
 ---
+## Репозиторий модели Berta, конвертированной в формат onnx
+Репозиторий оригинальной модели: https://huggingface.co/sergeyzh/BERTA
+## BERTA
+Модель для расчетов эмбеддингов предложений на русском и английском языках получена методом дистилляции эмбеддингов [ai-forever/FRIDA](https://huggingface.co/ai-forever/FRIDA) (размер эмбеддингов - 1536, слоёв - 24) в [sergeyzh/LaBSE-ru-turbo](https://huggingface.co/sergeyzh/LaBSE-ru-turbo) (размер эмбеддингов - 768, слоёв - 12). Основной режим использования FRIDA - CLS pooling заменен на mean pooling. Каких-либо других  изменений поведения модели не производилось. Дистиляция выполнена в максимально возможном объеме - эмбеддинги русских и английских предложений, работа префиксов.
+Размер контекста модели соответствует FRIDA - 512 токенов.
+## Префиксы
+Все префиксы унаследованы от FRIDA.
+Оптимальный (обеспечивающий средние результаты) префикс для большинства задач - "categorize_entailment: " прописан по умолчанию в [config_sentence_transformers.json](https://huggingface.co/sergeyzh/BERTA/blob/main/config_sentence_transformers.json)
+Перечень используемых префиксов и их влияние на оценки модели в [encodechka](https://github.com/avidale/encodechka):
+| Префикс                | STS       | PI        | NLI       | SA        | TI        |
+|:-----------------------|:---------:|:---------:|:---------:|:---------:|:---------:|
+| -                      |   0,842   |   0,757   |   0,463   | **0,830** |   0,985   |
+| search_query:          |   0,853   |   0,767   |   0,479   |   0,825   |   0,987   |
+| search_document:       |   0,831   |   0,749   |   0,463   |   0,817   |   0,986   |
+| paraphrase:            |   0,847   | **0,778** |   0,446   |   0,825   |   0,986   |
+| categorize:            | **0,857** |   0,765   |   0,501   |   0,829   | **0,988** |
+| categorize_sentiment:  |   0,589   |   0,535   |   0,417   |   0,805   |   0,982   |
+| categorize_topic:      |   0,740   |   0,521   |   0,396   |   0,770   |   0,982   |
+| categorize_entailment: |   0,841   |   0,762   | **0,571** |   0,827   |   0,986   |
+**Задачи:**
+- Semantic text similarity (**STS**);
+- Paraphrase identification (**PI**);
+- Natural language inference (**NLI**);
+- Sentiment analysis (**SA**);
+- Toxicity identification (**TI**).
+# Метрики
+Оценки модели на бенчмарке [ruMTEB](https://habr.com/ru/companies/sberdevices/articles/831150/):
+|Model Name                      | Metric              | FRIDA     | BERTA     | [rubert-mini-frida](https://huggingface.co/sergeyzh/rubert-mini-frida)   | multilingual-e5-large-instruct | multilingual-e5-large |
+|:-------------------------------|:--------------------|----------:|----------:|--------------------:|---------------------:|----------------------:|
+|CEDRClassification              | Accuracy            | **0.646** |   0.622   |        0.552        |        0.500         |         0.448         |
+|GeoreviewClassification         | Accuracy            | **0.577** |   0.548   |        0.464        |        0.559         |         0.497         |
+|GeoreviewClusteringP2P          | V-measure           | **0.783** |   0.738   |        0.698        |        0.743         |         0.605         |
+|HeadlineClassification          | Accuracy            |   0.890   | **0.891** |        0.880        |        0.862         |         0.758         |
+|InappropriatenessClassification | Accuracy            | **0.783** |   0.748   |        0.698        |        0.655         |         0.616         |
+|KinopoiskClassification         | Accuracy            | **0.705** |   0.678   |        0.595        |        0.661         |         0.566         |
+|RiaNewsRetrieval                | NDCG@10             | **0.868** |   0.816   |        0.721        |        0.824         |         0.807         |
+|RuBQReranking                   | MAP@10              | **0.771** |   0.752   |        0.711        |        0.717         |         0.756         |
+|RuBQRetrieval                   | NDCG@10             |   0.724   |   0.710   |        0.654        |        0.692         |       **0.741**       |
+|RuReviewsClassification         | Accuracy            | **0.751** |   0.723   |        0.658        |        0.686         |         0.653         |
+|RuSTSBenchmarkSTS               | Pearson correlation |   0.814   |   0.822   |        0.803        |      **0.840**       |         0.831         |
+|RuSciBenchGRNTIClassification   | Accuracy            | **0.699** |   0.690   |        0.625        |        0.651         |         0.582         |
+|RuSciBenchGRNTIClusteringP2P    | V-measure           | **0.670** |   0.650   |        0.586        |        0.622         |         0.520         |
+|RuSciBenchOECDClassification    | Accuracy            |   0.546   | **0.555** |        0.493        |        0.502         |         0.445         |
+|RuSciBenchOECDClusteringP2P     | V-measure           | **0.566** |   0.556   |        0.507        |        0.528         |         0.450         |
+|SensitiveTopicsClassification   | Accuracy            |   0.398   | **0.399** |        0.373        |        0.323         |         0.257         |
+|TERRaClassification             | Average Precision   | **0.665** |   0.657   |        0.606        |        0.639         |         0.584         |
+|Model Name                      | Metric              | FRIDA     | BERTA     | rubert-mini-frida   | multilingual-e5-large-instruct | multilingual-e5-large |
+|:-------------------------------|:--------------------|----------:|----------:|--------------------:|----------------------:|---------------------:|
+|Classification                  | Accuracy            | **0.707** |   0.698   |        0.631        |        0.654          |        0.588         |
+|Clustering                      | V-measure           | **0.673** |   0.648   |        0.597        |        0.631          |        0.525         |
+|MultiLabelClassification        | Accuracy            | **0.522** |   0.510   |        0.463        |        0.412          |        0.353         |
+|PairClassification              | Average Precision   | **0.665** |   0.657   |        0.606        |        0.639          |        0.584         |
+|Reranking                       | MAP@10              | **0.771** |   0.752   |        0.711        |        0.717          |        0.756         |
+|Retrieval                       | NDCG@10             | **0.796** |   0.763   |        0.687        |        0.758          |        0.774         |
+|STS                             | Pearson correlation |   0.814   |   0.822   |        0.803        |      **0.840**        |        0.831         |
+|Average                         | Average             | **0.707** |   0.693   |        0.643        |        0.664          |        0.630         |
+## Использование модели с библиотекой `transformers`:
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer, AutoModel
+def pool(hidden_state, mask, pooling_method="mean"):
+    if pooling_method == "mean":
+        s = torch.sum(hidden_state * mask.unsqueeze(-1).float(), dim=1)
+        d = mask.sum(axis=1, keepdim=True).float()
+        return s / d
+    elif pooling_method == "cls":
+        return hidden_state[:, 0]
+inputs = [
+    #
+    "paraphrase: В Ярославской области разрешили работу бань, но без посетителей",
+    "categorize_entailment: Женщину доставили в больницу, за ее жизнь сейчас борются врачи.",
+    "search_query: Сколько программистов нужно, чтобы вкрутить лампочку?",
+    #
+    "paraphrase: Ярославским баням разрешили работать без посетителей",
+    "categorize_entailment: Женщину спасают врачи.",
+    "search_document: Чтобы вкрутить лампочку, требуется три программиста: один напишет программу извлечения лампочки, другой — вкручивания лампочки, а третий проведет тестирование."
+]
+tokenizer = AutoTokenizer.from_pretrained("sergeyzh/BERTA")
+model = AutoModel.from_pretrained("sergeyzh/BERTA")
+tokenized_inputs = tokenizer(inputs, max_length=512, padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**tokenized_inputs)
+embeddings = pool(
+    outputs.last_hidden_state,
+    tokenized_inputs["attention_mask"],
+    pooling_method="mean"
+)
+embeddings = F.normalize(embeddings, p=2, dim=1)
+sim_scores = embeddings[:3] @ embeddings[3:].T
+print(sim_scores.diag().tolist())
+# [0.9530372023582458, 0.866746723651886,  0.7839133143424988]
+# [0.9360030293464661, 0.8591322302818298, 0.728583037853241] - FRIDA
+```
+## Использование с `sentence_transformers` (sentence-transformers>=2.4.0):
+```python
+from sentence_transformers import SentenceTransformer
+# loads model with mean pooling
+model = SentenceTransformer("sergeyzh/BERTA")
+paraphrase = model.encode(["В Ярославской области разрешили работу бань, но без посетителей", "Ярославским баням разрешили работать без посетителей"], prompt="paraphrase: ")
+print(paraphrase[0] @ paraphrase[1].T)
+# 0.9530372
+# 0.9360032 - FRIDA
+categorize_entailment = model.encode(["Женщину доставили в больницу, за ее жизнь сейчас борются врачи.", "Женщину спасают врачи."], prompt="categorize_entailment: ")
+print(categorize_entailment[0] @ categorize_entailment[1].T)
+# 0.8667469
+# 0.8591322 - FRIDA
+query_embedding = model.encode("Сколько программистов нужно, чтобы вкрутить лампочку?", prompt="search_query: ")
+document_embedding = model.encode("Чтобы вкрутить лампочку, требуется три программиста: один напишет программу извлечения лампочки, другой — вкручивания лампочки, а третий проведет тестирование.", prompt="search_document: ")
+print(query_embedding @ document_embedding.T)
+# 0.7839136
+# 0.7285831 - FRIDA
+```

onnx/berta-onnx/BERTA.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:465286621e28ba4663fd22b84b90a1135efc95519ca34917f536cd87e6fa2b84
+size 1222522

onnx/berta-onnx/BERTA.onnx.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c67530b9d0380f15d915fbf58d97b99c9cf56d6082fe96ae9ab36378783de195
+size 513410048

onnx/berta-onnx/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

onnx/berta-onnx/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

onnx/berta-onnx/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "repo_type": "model",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

onnx/berta-onnx/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pyproject.toml ADDED Viewed

	@@ -0,0 +1,28 @@

+[project]
+name = "frida-transformed"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13, <3.14"
+dependencies = [
+    'onnx == 1.20.1',
+    'onnxruntime == 1.23.2',
+    'onnxscript == 0.6.0',
+    'onnx-safetensors == 1.5.0',
+    'torch == 2.10.0',
+    'torchvision == 0.25.0',
+    'transformers == 4.57.3',
+    'pycuda == 2026.1',
+    "ipykernel>=7.2.0",
+    "pip>=26.0.1",
+    "uv>=0.10.2",
+    "jupyter>=1.1.1",
+    "ipywidgets>=8.1.8",
+    "tqdm>=4.67.3",
+    "ipython>=9.10.0",
+]
+[tool.uv.workspace]
+members = [
+    "frida-transformed",
+]

safetensors_to_onnx.ipynb ADDED Viewed

	@@ -0,0 +1,380 @@

+{
+ "cells": [
+  {
+   "metadata": {
+    "collapsed": true,
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:46.678786554Z",
+     "start_time": "2026-02-12T12:52:43.490350354Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "import torch\n",
+    "from torch.export import Dim\n",
+    "from transformers import BertModel, AutoModel, AutoTokenizer\n",
+    "from pathlib import Path\n",
+    "import onnxruntime as ort\n",
+    "import numpy as np\n",
+    "from inspect import signature"
+   ],
+   "id": "2b3977272abf14d9",
+   "outputs": [],
+   "execution_count": 1
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:46.726391124Z",
+     "start_time": "2026-02-12T12:52:46.691717774Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# MODEL_SOURCE_ID = \"sergeyzh/BERTA\"\n",
+    "MODEL_SOURCE_ID = \"../BERTA\"\n",
+    "MODEL_TARGET_PATH = Path(\"onnx/berta-onnx\")\n",
+    "ONNX_FILE_NAME = \"BERTA.onnx\"\n",
+    "\n",
+    "print(\"=\"*50)\n",
+    "print(f\"Подготовка директории: {MODEL_TARGET_PATH}\")\n",
+    "MODEL_TARGET_PATH.mkdir(parents=True, exist_ok=True)"
+   ],
+   "id": "494fc15203b0fb89",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "==================================================\n",
+      "Подготовка директории: onnx/berta-onnx\n"
+     ]
+    }
+   ],
+   "execution_count": 2
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:46.862603179Z",
+     "start_time": "2026-02-12T12:52:46.739714466Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# 1. Загружаем модель и токенизатор\n",
+    "print(f\"Загрузка модели и токенизатора из '{MODEL_SOURCE_ID}'...\")\n",
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL_SOURCE_ID, repo_type=\"model\")\n",
+    "model = AutoModel.from_pretrained(MODEL_SOURCE_ID)\n",
+    "model.eval()\n",
+    "\n",
+    "# 2. Создаем тестовые входы\n",
+    "print(\"Создание тестовых входных данных...\")\n",
+    "test_texts = [\n",
+    "    \"paraphrase: В Ярославской области разрешили работу бань, но без посетителей\",\n",
+    "    \"search_query: Сколько программистов нужно, чтобы вкрутить лампочку?\",\n",
+    "    \"categorize_entailment: Женщину доставили в больницу, за ее жизнь сейчас борются врачи.\"\n",
+    "]\n",
+    "\n",
+    "dummy_inputs = tokenizer(\n",
+    "    test_texts,\n",
+    "    max_length=512,\n",
+    "    padding=\"max_length\",\n",
+    "    truncation=True,\n",
+    "    return_tensors=\"pt\"\n",
+    ")\n",
+    "print(dummy_inputs)"
+   ],
+   "id": "4f9f5febc6f07769",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Загрузка модели и токенизатора из '../BERTA'...\n",
+      "Создание тестовых входных данных...\n",
+      "{'input_ids': tensor([[    2,   570, 11028,  ...,     0,     0,     0],\n",
+      "        [    2,  3007,    67,  ...,     0,     0,     0],\n",
+      "        [    2, 46369,   998,  ...,     0,     0,     0]]), 'token_type_ids': tensor([[0, 0, 0,  ..., 0, 0, 0],\n",
+      "        [0, 0, 0,  ..., 0, 0, 0],\n",
+      "        [0, 0, 0,  ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],\n",
+      "        [1, 1, 1,  ..., 0, 0, 0],\n",
+      "        [1, 1, 1,  ..., 0, 0, 0]])}\n"
+     ]
+    }
+   ],
+   "execution_count": 3
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:46.899958136Z",
+     "start_time": "2026-02-12T12:52:46.868506089Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# print(model)\n",
+    "print(signature(model.forward))"
+   ],
+   "id": "8bdce4e5bc593383",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(input_ids: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, token_type_ids: Optional[torch.Tensor] = None, position_ids: Optional[torch.Tensor] = None, head_mask: Optional[torch.Tensor] = None, inputs_embeds: Optional[torch.Tensor] = None, encoder_hidden_states: Optional[torch.Tensor] = None, encoder_attention_mask: Optional[torch.Tensor] = None, past_key_values: Optional[transformers.cache_utils.Cache] = None, use_cache: Optional[bool] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, cache_position: Optional[torch.Tensor] = None) -> Union[tuple[torch.Tensor], transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions]\n"
+     ]
+    }
+   ],
+   "execution_count": 4
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:56.427369911Z",
+     "start_time": "2026-02-12T12:52:46.902043777Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# 3. Экспорт с двумя входами\n",
+    "onnx_model_path = MODEL_TARGET_PATH / ONNX_FILE_NAME\n",
+    "print(f\"Экспорт модели в ONNX формат: {onnx_model_path}\")\n",
+    "\n",
+    "# For dynamic_shapes\n",
+    "batch_size = Dim(\"batch_size\", min=1, max=64)  # Optional: add min/max constraints\n",
+    "sequence_length = Dim(\"sequence_length\", min=2, max=512)\n",
+    "\n",
+    "# dynamic_shapes = {\n",
+    "#     \"input_ids\": {0: batch_size, 1: sequence_length},\n",
+    "#     \"attention_mask\": {0: batch_size, 1: sequence_length},\n",
+    "#     \"last_hidden_state\": {0: batch_size, 1: sequence_length}\n",
+    "# }\n",
+    "\n",
+    "# In case of issues use dynamo_export instead of dynamo=True\n",
+    "torch.onnx.export(\n",
+    "    model,\n",
+    "    (dummy_inputs[\"input_ids\"], dummy_inputs[\"attention_mask\"]),\n",
+    "    onnx_model_path.as_posix(),\n",
+    "    input_names=[\"input_ids\", \"attention_mask\"],\n",
+    "    output_names=[\"last_hidden_state\"],\n",
+    "    opset_version=20, # Maybe update\n",
+    "    dynamic_shapes = {\n",
+    "        \"input_ids\": {0: batch_size, 1: sequence_length},\n",
+    "        \"attention_mask\": {0: batch_size, 1: sequence_length}\n",
+    "    },\n",
+    "    verbose=True,\n",
+    "    dynamo=True\n",
+    ")\n",
+    "# 4. Сохраняем токенизатор\n",
+    "print(f\"Сохранение токенизатора в '{MODEL_TARGET_PATH}'...\")\n",
+    "tokenizer.save_pretrained(MODEL_TARGET_PATH)\n",
+    "\n",
+    "print(\"Конвертация завершена успешно!\")"
+   ],
+   "id": "87d59bf71ed545dc",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Экспорт модели в ONNX формат: onnx/berta-onnx/BERTA.onnx\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "W0212 14:52:47.799000 19280 torch/onnx/_internal/exporter/_schemas.py:455] Missing annotation for parameter 'input' from (input, boxes, output_size: 'Sequence[int]', spatial_scale: 'float' = 1.0, sampling_ratio: 'int' = -1, aligned: 'bool' = False). Treating as an Input.\n",
+      "W0212 14:52:47.800000 19280 torch/onnx/_internal/exporter/_schemas.py:455] Missing annotation for parameter 'boxes' from (input, boxes, output_size: 'Sequence[int]', spatial_scale: 'float' = 1.0, sampling_ratio: 'int' = -1, aligned: 'bool' = False). Treating as an Input.\n",
+      "W0212 14:52:47.801000 19280 torch/onnx/_internal/exporter/_schemas.py:455] Missing annotation for parameter 'input' from (input, boxes, output_size: 'Sequence[int]', spatial_scale: 'float' = 1.0). Treating as an Input.\n",
+      "W0212 14:52:47.802000 19280 torch/onnx/_internal/exporter/_schemas.py:455] Missing annotation for parameter 'boxes' from (input, boxes, output_size: 'Sequence[int]', spatial_scale: 'float' = 1.0). Treating as an Input.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[torch.onnx] Obtain model graph for `BertModel([...]` with `torch.export.export(..., strict=False)`...\n",
+      "[torch.onnx] Obtain model graph for `BertModel([...]` with `torch.export.export(..., strict=False)`... ✅\n",
+      "[torch.onnx] Run decomposition...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/lavrentiy/Projects/FRIDA-transformed/.venv/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: \n",
+      "    Found GPU0 NVIDIA GeForce GTX 1060 6GB which is of cuda capability 6.1.\n",
+      "    Minimum and Maximum cuda capability supported by this version of PyTorch is\n",
+      "    (7.0) - (12.0)\n",
+      "    \n",
+      "  queued_call()\n",
+      "/home/lavrentiy/Projects/FRIDA-transformed/.venv/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: \n",
+      "    Please install PyTorch with a following CUDA\n",
+      "    configurations:  12.6 following instructions at\n",
+      "    https://pytorch.org/get-started/locally/\n",
+      "    \n",
+      "  queued_call()\n",
+      "/home/lavrentiy/Projects/FRIDA-transformed/.venv/lib/python3.13/site-packages/torch/cuda/__init__.py:435: UserWarning: \n",
+      "NVIDIA GeForce GTX 1060 6GB with CUDA capability sm_61 is not compatible with the current PyTorch installation.\n",
+      "The current PyTorch install supports CUDA capabilities sm_70 sm_75 sm_80 sm_86 sm_90 sm_100 sm_120.\n",
+      "If you want to use the NVIDIA GeForce GTX 1060 6GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/\n",
+      "\n",
+      "  queued_call()\n",
+      "/home/lavrentiy/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/copyreg.py:99: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.\n",
+      "  return cls.__new__(cls, *args)\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[torch.onnx] Run decomposition... ✅\n",
+      "[torch.onnx] Translate the graph into ONNX...\n",
+      "[torch.onnx] Translate the graph into ONNX... ✅\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/lavrentiy/Projects/FRIDA-transformed/.venv/lib/python3.13/site-packages/torch/onnx/_internal/exporter/_onnx_program.py:460: UserWarning: # The axis name: batch_size will not be used, since it shares the same shape constraints with another axis: batch_size.\n",
+      "  rename_mapping = _dynamic_shapes.create_rename_mapping(\n",
+      "/home/lavrentiy/Projects/FRIDA-transformed/.venv/lib/python3.13/site-packages/torch/onnx/_internal/exporter/_onnx_program.py:460: UserWarning: # The axis name: sequence_length will not be used, since it shares the same shape constraints with another axis: sequence_length.\n",
+      "  rename_mapping = _dynamic_shapes.create_rename_mapping(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Applied 68 of general pattern rewrite rules.\n",
+      "Сохранение токенизатора в 'onnx/berta-onnx'...\n",
+      "Конвертация завершена успешно!\n"
+     ]
+    }
+   ],
+   "execution_count": 5
+  },
+  {
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2026-02-12T12:52:56.931194388Z",
+     "start_time": "2026-02-12T12:52:56.428745759Z"
+    }
+   },
+   "cell_type": "code",
+   "source": [
+    "# 5. Тестирование и сравнение результатов\n",
+    "print(\"\\n\" + \"=\"*50)\n",
+    "print(\"ТЕСТИРОВАНИЕ РЕЗУЛЬТАТОВ\")\n",
+    "\n",
+    "def cls_pooling(hidden_state, attention_mask):\n",
+    "    \"\"\"CLS pooling для получения эмбеддингов\"\"\"\n",
+    "    return hidden_state[:, 0]\n",
+    "\n",
+    "def normalize_embeddings(embeddings):\n",
+    "    \"\"\"Нормализация эмбеддингов\"\"\"\n",
+    "    return embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)\n",
+    "\n",
+    "# Тест с оригинальной моделью\n",
+    "print(\"Тестирование оригинальной модели...\")\n",
+    "with torch.no_grad():\n",
+    "    original_inputs = tokenizer(\n",
+    "        test_texts,\n",
+    "        max_length=512,\n",
+    "        padding=True,\n",
+    "        truncation=True,\n",
+    "        return_tensors=\"pt\"\n",
+    "    )\n",
+    "    original_outputs = model(**original_inputs)\n",
+    "    original_embeddings = cls_pooling(\n",
+    "        original_outputs.last_hidden_state,\n",
+    "        original_inputs[\"attention_mask\"]\n",
+    "    )\n",
+    "    original_embeddings = torch.nn.functional.normalize(original_embeddings, p=2, dim=1)\n",
+    "\n",
+    "# Тест с ONNX моделью\n",
+    "print(\"Тестирование ONNX модели...\")\n",
+    "onnx_session = ort.InferenceSession(onnx_model_path.as_posix())\n",
+    "\n",
+    "onnx_inputs = tokenizer(\n",
+    "    test_texts,\n",
+    "    max_length=512,\n",
+    "    padding=True,\n",
+    "    truncation=True,\n",
+    "    return_tensors=\"np\"\n",
+    ")\n",
+    "\n",
+    "\n",
+    "onnx_inputs_int64 = {\n",
+    "    \"input_ids\": onnx_inputs[\"input_ids\"].astype(np.int64),\n",
+    "    \"attention_mask\": onnx_inputs[\"attention_mask\"].astype(np.int64)\n",
+    "}\n",
+    "\n",
+    "onnx_outputs = onnx_session.run(None, onnx_inputs_int64)[0]\n",
+    "\n",
+    "onnx_embeddings = onnx_outputs[:, 0]\n",
+    "onnx_embeddings = normalize_embeddings(onnx_embeddings)\n",
+    "\n",
+    "cosine_similarity = np.sum(original_embeddings.numpy() * onnx_embeddings, axis=1)\n",
+    "print(f\"\\nCosine similarity между оригинальной и ONNX моделью:\")\n",
+    "for i, sim in enumerate(cosine_similarity):\n",
+    "    print(f\"  Текст {i+1}: {sim:.6f}\")\n",
+    "print(f\"Средняя схожесть: {np.mean(cosine_similarity):.6f}\")\n",
+    "\n",
+    "print(\"\\n\" + \"=\"*50)\n",
+    "print(\"ГОТОВО! Модель успешно конвертирована и протестирована.\")\n",
+    "print(f\"Путь к модели: {MODEL_TARGET_PATH.resolve()}\")"
+   ],
+   "id": "91a5740805f8e829",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "==================================================\n",
+      "ТЕСТИРОВАНИЕ РЕЗУЛЬТАТОВ\n",
+      "Тестирование оригинальной модели...\n",
+      "Тестирование ONNX модели...\n",
+      "\n",
+      "Cosine similarity между оригинальной и ONNX моделью:\n",
+      "  Текст 1: 1.000000\n",
+      "  Текст 2: 1.000000\n",
+      "  Текст 3: 1.000000\n",
+      "Средняя схожесть: 1.000000\n",
+      "\n",
+      "==================================================\n",
+      "ГОТОВО! Модель успешно конвертирована и протестирована.\n",
+      "Путь к модели: /home/lavrentiy/Projects/BERTA-transformed/onnx/berta-onnx\n"
+     ]
+    }
+   ],
+   "execution_count": 6
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

safetensors_to_onnx.py ADDED Viewed

	@@ -0,0 +1,136 @@

+import torch
+from torch.export import Dim
+from transformers import T5EncoderModel, AutoTokenizer
+from pathlib import Path
+import onnxruntime as ort
+import numpy as np
+# MODEL_SOURCE_ID = "ai-forever/FRIDA"
+MODEL_SOURCE_ID = "../FRIDA"
+MODEL_TARGET_PATH = Path("onnx/frida-onnx")
+ONNX_FILE_NAME = "FRIDA.onnx"
+print("="*50)
+print(f"Подготовка директории: {MODEL_TARGET_PATH}")
+MODEL_TARGET_PATH.mkdir(parents=True, exist_ok=True)
+# 1. Загружаем модель и токенизатор
+print(f"Загрузка модели и токенизатора из '{MODEL_SOURCE_ID}'...")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_SOURCE_ID, repo_type="model")
+model = T5EncoderModel.from_pretrained(MODEL_SOURCE_ID)
+model.eval()
+# 2. Создаем тестовые входы
+print("Создание тестовых входных данных...")
+test_texts = [
+    "paraphrase: В Ярославской области разрешили работу бань, но без посетителей",
+    "search_query: Сколько программистов нужно, чтобы вкрутить лампочку?",
+    "categorize_entailment: Женщину доставили в больницу, за ее жизнь сейчас борются врачи."
+]
+dummy_inputs = tokenizer(
+    test_texts,
+    max_length=512,
+    padding="max_length",
+    truncation=True,
+    return_tensors="pt"
+)
+# 3. Экспорт с двумя входами
+onnx_model_path = MODEL_TARGET_PATH / ONNX_FILE_NAME
+print(f"Экспорт модели в ONNX формат: {onnx_model_path}")
+# For dynamic_shapes
+batch_size = Dim("batch_size", min=1, max=64)  # Optional: add min/max constraints
+sequence_length = Dim("sequence_length", min=2, max=512)
+# dynamic_shapes = {
+#     "input_ids": {0: batch_size, 1: sequence_length},
+#     "attention_mask": {0: batch_size, 1: sequence_length},
+#     "last_hidden_state": {0: batch_size, 1: sequence_length}
+# }
+# In case of issues use dynamo_export instead of dynamo=True
+torch.onnx.export(
+    model,
+    (dummy_inputs["input_ids"], dummy_inputs["attention_mask"]),
+    onnx_model_path.as_posix(),
+    input_names=["input_ids", "attention_mask"],
+    output_names=["last_hidden_state"],
+    opset_version=20, # Maybe update
+        dynamic_shapes = {
+        "input_ids": {0: batch_size, 1: sequence_length},
+        "attention_mask": {0: batch_size, 1: sequence_length}
+    },
+    verbose=False,
+    dynamo=True
+)
+# 4. Сохраняем токенизатор
+print(f"Сохранение токенизатора в '{MODEL_TARGET_PATH}'...")
+tokenizer.save_pretrained(MODEL_TARGET_PATH)
+print("Конвертация завершена успешно!")
+# 5. Тестирование и сравнение результатов
+print("\n" + "="*50)
+print("ТЕСТИРОВАНИЕ РЕЗУЛЬТАТОВ")
+def cls_pooling(hidden_state, attention_mask):
+    """CLS pooling для получения эмбеддингов"""
+    return hidden_state[:, 0]
+def normalize_embeddings(embeddings):
+    """Нормализация эмбеддингов"""
+    return embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
+# Тест с оригинальной моделью
+print("Тестирование оригинальной модели...")
+with torch.no_grad():
+    original_inputs = tokenizer(
+        test_texts,
+        max_length=512,
+        padding=True,
+        truncation=True,
+        return_tensors="pt"
+    )
+    original_outputs = model(**original_inputs)
+    original_embeddings = cls_pooling(
+        original_outputs.last_hidden_state,
+        original_inputs["attention_mask"]
+    )
+    original_embeddings = torch.nn.functional.normalize(original_embeddings, p=2, dim=1)
+# Тест с ONNX моделью
+print("Тестирование ONNX модели...")
+onnx_session = ort.InferenceSession(onnx_model_path.as_posix())
+onnx_inputs = tokenizer(
+    test_texts,
+    max_length=512,
+    padding=True,
+    truncation=True,
+    return_tensors="np"
+)
+onnx_inputs_int64 = {
+    "input_ids": onnx_inputs["input_ids"].astype(np.int64),
+    "attention_mask": onnx_inputs["attention_mask"].astype(np.int64)
+}
+onnx_outputs = onnx_session.run(None, onnx_inputs_int64)[0]
+onnx_embeddings = onnx_outputs[:, 0]
+onnx_embeddings = normalize_embeddings(onnx_embeddings)
+cosine_similarity = np.sum(original_embeddings.numpy() * onnx_embeddings, axis=1)
+print(f"\nCosine similarity между оригинальной и ONNX моделью:")
+for i, sim in enumerate(cosine_similarity):
+    print(f"  Текст {i+1}: {sim:.6f}")
+print(f"Средняя схожесть: {np.mean(cosine_similarity):.6f}")
+print("\n" + "="*50)
+print("ГОТОВО! Модель успешно конвертирована и протестирована.")
+print(f"Путь к модели: {MODEL_TARGET_PATH.resolve()}")