Instructions to use Tochka-AI/ruRoPEBert-e5-base-2k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tochka-AI/ruRoPEBert-e5-base-2k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Tochka-AI/ruRoPEBert-e5-base-2k", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("Tochka-AI/ruRoPEBert-e5-base-2k", trust_remote_code=True) model = AutoModelForMaskedLM.from_pretrained("Tochka-AI/ruRoPEBert-e5-base-2k", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
How to export to ONNX ?
I convert with this command:
optimum-cli export onnx --model Tochka-AI/ruRoPEBert-e5-base-2k --optimize O1 --task sentence-similarity ruRoPEBert-e5-base-2k-onnx/
But I get the following error:
.env/lib/python3.11/site-packages/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self
Hello!
It seems that the problem is related to dummy input generator in Optimum. It doesn't take into account type_vocab_size parameter in model config, instead it always use a fixed value of 2. I will create an issue in Optimum for it.
For you, I guess the easiest option now is to manually convert the model to ONNX using this guide and this page, or, alternatively, write a custom DummyInputGenerator, but this one may be tricky.
still same problem