Instructions to use Xenova/multilingual-e5-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use Xenova/multilingual-e5-large with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('feature-extraction', 'Xenova/multilingual-e5-large');
Loading the non-quantized model in browser
#1
by nbolton04 - opened
My workflow is to embed the documents on GPU. I tried using the quantized model in Python on GPU but saw a significant decrease in performance, and it seems it is at over 1000% CPU usage even with batch size of 1. My next step was to try and load the non-quantized model via transformers.js seeing as i rather the client-side browser inference be slow than having initial processing take that much longer. However, when I do that I get an error. Do you have suggestions or examples of how to do the latter?